University of Copenhagen researcher Isabelle Augenstein and colleagues trawled through 3.5 million fiction and non-fiction books, all published in English between 1900 to 2008, in an effort to find out whether there is a difference between the types of words used to describe men and women in literature.
“We are clearly able to see that the words used for women refer much more to their appearances than the words used to describe men,” Dr. Augenstein said.
“Thus, we have been able to confirm a widespread perception, only now at a statistical level.”
Using a new computer model, Dr. Augenstein and her co-authors from University College London, Johns Hopkins University, Microsoft Research and the University of Cambridge analyzed a dataset of 3.5 million books.
The scientists extracted adjectives and verbs associated with gender-specific nouns (e.g. ‘daughter’ and ‘stewardess’). For example, in combinations such as ‘sexy stewardess’ or ‘girls gossiping.’
They then analyzed whether the words had a positive, negative or neutral sentiment, and subsequently which categories the words could be divided into.
Adjectives, with sentiment, used to describe men and women, as represented by the team’s model. Colors indicate the most common sense of each adjective; black indicates out of lexicon. Two patterns are immediately apparent: positive adjectives describing women are often related to their bodies, while positive adjectives describing men are often related to their behavior. Image credit: Hoyle et al.
“Our analyses demonstrate that negative verbs associated with body and appearance are used with five times the frequency for females than males,” they said.
“The analyses also demonstrate that positive and neutral adjectives relating to the body and appearance occur approximately twice as often in descriptions of females, while males are most frequently described using adjectives that refer to their behavior and personal qualities.”
‘Beautiful’ and ‘sexy’ were two of the adjectives most frequently used to describe women; commonly used descriptors for men included ‘righteous,’ ‘rational’ and ‘brave.’
“Although many of the books were published several decades ago, they still play an active role,” Dr. Augenstein said.
“As artificial intelligence and language technology become more prominent across society, it is important to be aware of gendered language,” she added.
“We can try to take this into account when developing machine-learning models by either using less biased text or by forcing models to ignore or counteract bias. All three things are possible.”
The researchers presented their findings July 29 at the 2019 Annual Meeting of the Association for Computational Linguistics in Florence, Italy.