Computational Sociolinguistics Tutorial

10 July 2017, Cologne, International Conference on Computational Social Science

Language is one of the main instruments by which people construct their identity and manage their social network. With the rise of social media and the increasing interest in studying social phenomena through large-scale text analysis, there has been a surge of interest in analyzing and modeling the social dimension of language using computational approaches. This tutorial provides a comprehensive overview of the emerging field of Computational Sociolinguistics.


  • Introduction

    An introduction to this research area and why it is relevant for computational social science.
  • Language and Social Identity Construction

    Language is one of the instruments that people use in shaping their identities. Recognizing that language use can reveal social patterns, various studies have explored the task of automatically inferring social variables (e.g., gender, age) from text. This part of the tutorial will cover prediction studies (e.g., can we automatically identify the gender of authors based on their language use?) as well as large-scale text analyses. Practical aspects, such as data collection (e.g., obtaining labels), will also be covered. This part also discusses how various NLP tools, such as sentiment detection, can be improved by accounting for language variation.
  • Language and Social Interaction

    People do not act in isolation, but they are part of pairs, groups and communities. Topics that will be discussed include the automatic extraction of social relationships from text, approaches to measure and analyze style shifting, and how members adapt their language to conform to or sometimes diverge from community norms.

  • Multilingualism and Social Interaction

    Many people are multilingual and they may use multiple languages in their daily communication. They may even use multiple languages in a single conversation and the choice of language depends on a variety of factors, such as the audience and topic. This part discusses computational approaches to process, model and analyze multilingual communication.

  • Methods, Data and Ethical Challenges

    A reflection on methodological challenges that arise from the interdisciplinary character of this research area, with a focus on how NLP can support theory building and explanation. Furthermore, the tutorial will discuss challenges related to data (e.g., biases) and will highlight relevant ethical challenges.

  • Open Research Challenges


Basic machine learning knowledge will be assumed but the tutorial will be accessible to an interdisciplinary audience, some who may be new to computational linguistics.

Slides (pdf)



Dong Nguyen is a research fellow at the Alan Turing Institute. She has a PhD degree from the University of Twente and a master’s degree from the Language Technologies Institute at Carnegie Mellon University. Her research has been featured in various media outlets, including Time Magazine and the New York Times. She is also affiliated with Edinburgh University.