Tuesday, August 9, 2011

The Wikipedia Gender Gap, Part II

In part I we talked about the small percentage of female editors in Wikipedia and their shorter editing life span. In this part we'll talk about content areas female and male editor focus on, coverage of female and male-related topics and involvement in editing controversial entries.

Content areas
The authors divided the data from the January 2008 data dump into 8 main areas: Arts, Geography, Health, History, Science, People, Philosophy and Religion. Then, they checked the focus areas of each editor's activity. The authors found that men focused more on Geography and Science, while women focused more on People and Arts.

January 2008 Gender distribution of editors in eight interest areas. Editors can be categorized into more than one area

The reason these data look different than those presented earlier is that they are taken from a different data pool (2008 as opposed to the more recent data used earlier).

Topics Coverage
Are female-related topics covered in Wikipedia as well as male-related topics? The authors used their gender data to determine whether an article is of more interest to women or to men. Since there are so few female editors, the metrics were "subject to high relative variance and noise" so they had to use only high-activity articles where gender was known for at least 30 editors. Articles shorter than 100 bytes were exclude because they usually redirected to other articles. The authors ended up with a sample of 59,579 articles.

Articles were declared "male" if they were in the bottom quintile (lowest 20%) of female editing activity, "neutral" if they were in the third (center) quintile, and "female" if they were in the top quintile.

Male articles are significantly longer than female articles (33,301 and 28,434 bytes respectively, t-Test, p < 0.001). Neutral articles are the longest at 36,511 bytes. Since the authors used the articles' length as a crude measurement of quality, they concluded that coverage of female topics is indeed lacking. They hypothesized that neutral articles are longer because they appeal to editors of both genders and therefore receive more overall attention.

For an additional analysis, the authors used the movie recommender web site MovieLens, which has self-reported gender information from over 80% of users who started using MovieLens before May 2003 (when they stopped asking about gender). 32% of the site's users were females. The authors mapped each movie to its Wikipedia article and excluded movies with less than 10 known-gender raters or movies which had no article. The remaining data set included 5,850 movies. The Article Length was the dependent variable, "Movie Gender" the independent variable and Movie Popularity, Movie Quality and Movie Age were the control variables. Articles about "male" movies were longer than those about "female" movies.

However, when articles about Nobel Prize winners and recipients of the Academy Award for Best Actor/Actress were analysed, it was found that they are about of equal length. So, the length gender gap isn't noticeable for very popular and/or important articles.

Controversial Topics
The authors hypothesized that "Females tend to avoid controversial or contentious articles." They determined controversial articles according to whether the articles were protected or not, reasoning that Wikipedia tend to lock articles which are often vandalized or subject to content disputes. 5.20% of the “female” articles were protected, compared with 2.39% of the “male” articles. Female editors are actually more likely to be involved in controversial articles.

Next time: are women less likely to be blocked? Are edits by women more likely to be reverted?

Lam, S., Uduwage, A., Dong, Z., Sen, S., Musicant, D. R., Terveen, L., & Terveen, J. (2011). WP:Clubhouse? An Exploration of Wikipedia’s Gender
Imbalance WikiSym’11, October 3–5, Mountain View, California



  1. Link to Part I is broken and I cannot find it elsewhere on the blog...

  2. Ugh, you're right. I fixed the link at the post and here it is just in case:

    Thanks for telling me!