Using DH to challenge women’s underrepresentation

By Joseph Soueidy

Cover picture from CIPD


As a previously elected student representative of the faculty of engineering at the American University of Beirut (AUB) , I often thought of the underrepresentation of women ( only 20% of representatives are women), but never thought of the possible reasons behind this gap as I knew that it was not a matter of merit. ). What if I told you that Digital Humanities can play a role in explaining/solving this underrepresentation especially in universities with daily, weekly, and monthly journals such as AUB. 

Textual Analysis and Gender

Before discussing the relation between textual analysis and gender, here is a brief definition of Digital Humanities (DH) and their tools. A DH is the intersection of computational tools and classical humanities, while DH tools can be leveraged to analyze gender in texts and provide readers with insights they probably did not know about.

Here is an interesting article if you want to know more about it:

As for textual analysis it is a methodology that involves understanding language in order to gain information regarding how people make sense and communicate life experiences. Textual analysis can analyze gender in texts using bi-grams for example. To give you more context, an n-gram is a sequence of n items from a text, while a bi-gram is an n-gram with n=2 and it can mainly be used in gender to analyze the word that will follow “he” or “she” and compare the differences. 

This is really important and specially in the MENA region given that we do not quantify humanities in order to understand them better. Let us explore one prominent textual tool that was leveraged in the region, more specifically in Egypt.

Use of AntConc in American University of Cairo (AUC)

The student’s newspaper in AUC, The Caravan, was been established in 1925 and is still publishing articles up until this day. A textual analysis was conducted on these articles by AUC students.

The first step in analyzing these articles was to digitalize the old written articles using Optical Character Recognition (OCR). As the name suggests, this technology will recognize text within an image and convert it to editable digital documents). You can learn more about this concept by checking the link below:

This is a relatively hard task, especially with Arabic terms and low-quality printing. Some OCR software that you can use are Adobe Acrobat Pro DCAbbyy Fine reader.

The second step was to clean up the data obtained from the OCR so that it can be used in step 3,  the analysis of the obtained corpus ( a collection of written texts can be grouped based on an author, a genre…)

AntConc can be used to determine the frequency of the words used in the corpus in order to see as a general analysis any flagrant repetitive word that might be of interest for our gender analysis.


In order to move forward with the analysis AntConc provides the user with an option to insert stopwords that would remove words that are not that relevant to the analysis such as the, if and …

Finally, when searching for key words such as women, men, he , she etc. you can check the context under which they are used by clicking on concordance and see if for example women is normally associated with inferiority whereas men are glorified. Here is an example also from the Caravan:

A limitation of AntConc  is that even though AntConc can analyze Arabic texts , it does not have an Arabic Interface that would make it more convenient for Arabic speakers. Here is an analysis of the n-grams of a random Arabic text that I personally conducted on AntConc to show you how we can easily analyze Arabic articles and quantify facts so we can visualize and understand them better, which will be the key to begin solving women’s underrepresentation by raising awareness about it through quantified facts.

Impact of Textual Analysis

Caravan is a student lead journal and, hence, it reflects how students express themselves which could be somehow biased especially if men’s voice are the only ones being properly represented.. By analyzing  these articles, students  can now see and understand that these articles underrepresent the women within the community whose voices they should be sharing. . Those articles are not the only sources that are not yet analyzed by scholars or researchers.

American University of Beirut (AUB) Journals

The Great Unread, introduced by the historian Franco Moretti, refers to the huge amount of primary source material that is not analyzed by scholars. This material includes a wide range of publications, from novels to university journals.

At AUB there are several student publications such as Watchdogs Gazette, Phoenix Daily, Coggs and Caffeine, and AUB Outlook. Those publications feature most of the time male accomplishments, published by male and  female writers, which unconsciously leads to the dominance of men in several aspects. The number of articles written by male authors or about male figures is just a small example, and thus, a textual analysis should be conducted to depict even more discrepancies that are affecting people in one way or another. This can definitely provide students with insights that may affect the representation of women in AUB council for example, given that this year, women constitute 20% of the council. But one thing is for sure, if we don’t analyze the data we have, we will never know! 

Edited by Daniella Razzouk

Leave a Reply

Your email address will not be published. Required fields are marked *