2
In many areas of social science, we would like to use machine learning models to make better decisions. However, many machine learning models are opaque or “black-box,” meaning that they do not explain their predictions in a way that humans can understand. This lack of transparency is problematic, leading to questions of possible model biases, and unclear accountability for incorrect decisions. Interpretable or “glass box” machine learning models give insight into model decisions and can be used to create more fair and accurate models. Interpretability in machine learning is crucial for high stakes decisions and troubleshooting. Interpretable machine learning started as far back as the 1970’s, but has gained momentum as a subfield only very recently. We will overview recent research in the area, provide fundamental principles for interpretable machine learning, and provide hands-on activities highlighting the use of the techniques on real world data. This tutorial will introduce the frontier of interpretable machine learning, and equip researchers and scientists with the knowledge and skills to apply interpretable machine learning in their research tasks for effective data analysis and responsible decision-making.
As we head into a crucial election year in the U.S. and Europe, political forces like populist and radical parties or movements or authoritarian governments abroad and societal processes such as polarization or declining trust in parliaments pose threats to the legitimacy of democratic institutions. Academic research on the role of digital media in shaping these processes and democracy at-large is striving. Nonetheless, our research areas and analysis potential are oftentimes confined to the data provided by platforms. For democracy research in particular, it is important to link digital behavioral data with individual-level information on demographics and variables like party identification, political trust or evaluations of other societal groups. While getting individual-level data has been an important issue ever since, platforms have restricted data access further. As potential remedies provided by the EU’s Digital Services Act are not yet foreseeable, researchers have to devise their own solutions for collecting relevant digital behavioral data in the “Post-API Age”. In many academic institutions, research software for data collection like web tracking via browser add-ons, mobile apps or data donations are being developed. Currently, there is the risk that these initiatives remain unconnected and work is duplicated. The workshop aims to bring together research groups working on new technical solutions and innovative approaches for studying digital democracy.
In the digital age, social media platforms have become crucial for societal interaction and communication. Computational social science, especially social media research, has shed light on crucial insights such as detecting bots, identifying suspicious activities, and uncovering narratives. Underlying these findings is the combination of large-scale data and network science techniques that reveal user connectivity and interactions. \n There have been significant shifts in the landscape of computational social science research in recent years. New restrictions on data access policies of widely used platforms pose significant challenges to the types of research that can be conducted. On the other hand, emerging platforms that offer open data access, like Bluesky and Mastodon, have seen a surge in popularity, opening opportunities for investigation. Additionally, the rapid development of large language models (LLMs) provides new insights to represent and understand published content. The Observatory on Social Media (OSoMe) addresses these challenges and opportunities by focusing on developing data acquisition tools for emergent platforms, providing historical datasets, and synthetic data, and developing novel data analysis tools and techniques. \n This tutorial aims to guide participants through these new developments, highlighting the current approaches for accessing social media data, including the use of OSoMe's infrastructure to acquire social media data or generate data from a model of a social media platform, and methodologies to understand this data. Attendees will learn to build various network types, extending beyond traditional interactions like replies and re-posts to include co-post and co-hashtag networks, enabling diverse data representations for different research needs. The tutorial will cover network science techniques, including basic network features, centrality measures, and community detection, along with techniques for building and analyzing text-based embeddings, such as those generated by the Sentence-BERT method. The tutorial will also cover techniques to extract narratives and attribute content-aware labels to communities. \n Moreover, participants will be guided through advanced visualization tools like Helios-Web, enhancing their ability to explore and disseminate their findings effectively. The tutorial will be conducted in Python and utilize Jupyter notebooks preloaded with datasets and scripts. These materials will be open-source and available on GitHub, providing participants with a toolkit to kickstart or advance their social media research endeavors.
Researchers from the IC2S2 community often struggle with the lack of access to data about online behavior. This challenge is even more pressing now that several APIs are closing. At the same time, in our everyday lives, we as individuals leave more and more digital traces behind on digital platforms: for example, by liking a post on Instagram or sending a message via WhatsApp; when we tap our electronic card on public transportation or complete an online banking transaction. The promise of digital humanities and computational social science is that researchers can utilize these digital traces to study human behavior and social interaction at an unprecedented level of detail. In summary, while the amount of digital trace data increases, most are closed off in proprietary archives of commercial corporations, with only a subset being available to a small set of researchers at a platform’s discretion, or through increasingly restricted and opaque APIs. \n This tutorial helps IC2S2 researchers understand and deploy an alternative to circumvent these challenges. This alternative approach to gain access to digital traces is enabled thanks to the GDPR’s right to data access and data portability and similar legislation in other countries. As a result, all data processing entities are required to provide citizens a digital copy of their personal data upon request in electronic form. We refer to these pieces of personal data as Data Download Packages (DDPs). This legislation allows researchers to invite participants to share their DDPs. A major challenge is, however, that DDPs potentially contain very sensitive data. Conversely, often not all data is needed to answer the specific research question. To tackle these challenges, an alternative workflow has been developed: First, the participant requests their personal DDP at the platform of interest. Second, they download it onto their own personal device. Third, by means of local processing, only the features of interest to the researcher are extracted from that DDP. Fourth, the participant inspects the extracted features after which they can choose what they want to donate (or decline to donate). Only after selecting the data for donation and clicking the button ‘donate’, the donated data is sent to a storage location and can be accessed by the researcher and be used for further analysis. \n After having participated in this tutorial, attendees will know what designing a data donation study entails and what important aspects should be considered. Attendees will learn about the different types of study designs in which data donation can be incorporated. Furthermore, attendees will learn how to configure their own data donation study using the open-source software Port and how to write their own Python scripts used for the extraction of digital trace data.
Social scientists with data science skills are increasingly assuming positions as computational social scientists in academic and non-academic organizations. However, as computational social science (CSS) is still relatively new to the social sciences, CSS can feel like a hidden curriculum for many Ph.D. students. To support social science Ph.D. students, we provide an accessible tutorial for CSS training based on our collective working experiences in academic, public, and private sector organizations. We argue that students should supplement their traditional social science training in research design and domain expertise with CSS training, focused on three core areas: (1) learning data science skills; (2) building a portfolio that uses data science to answer social science questions; and (3) connecting with computational social scientists. We conclude with some practical recommendations for departments and professional associations to better support Ph.D. students. \n The paper form of this tutorial was published in PS: Political Science and Politics, the American Political Science Association’s professionalization journal, and has been viewed 2,317 times and downloaded 584 times since August 2023 (as of December 23, 2023).
Our tutorial will guide participants through the practical aspects and hands-on experiences of using Large Language Models (LLMs) in Computational Social Science (CSS). In recent years, LLMs have emerged as powerful tools capable of executing a variety of language processing tasks in a zero-shot manner, without the need for task-specific training data. This capability presents a significant opportunity for the field of CSS, particularly in classifying complex social phenomena such as persuasiveness and political ideology, as well coding or explaining new social science constructs that are latent in text. This tutorial provides an in- depth overview on how LLMs can be used to enhance CSS research. First, we will provide a set of best practices for prompting LLMs, an essential skill for effectively harnessing their capabilities in a zero-shot context. This step of the talk assumes no prior background. We will explain how to select an appropriate model for the task, and how factors like model size and task complexity can help researchers anticipate model performance. To this end, we introduce an extensive evaluation pipeline, meticulously designed to assess the performance of different language models across diverse CSS benchmarks. By covering these results, we will show how CSS research can be broadened to a wider range of hypotheses than prior tools and data resources could support. Second, we will discuss some of the limitations with prompting as a methodology for certain measurement scales and data types, including ordinal data, and continuous distributions. This part will look more “under the hood” of a language model to outline challenges around decoding numeric tokens, probing model activations as well as intervening on model parameters. By the end of this session, attendees will be equipped with the knowledge and skills to effectively integrate LLMs into their CSS research.
A deluge of digital content is generated daily by web-based platforms and sensors that capture digital traces of communication and connection, and complex states of society, the economy, the human mind, and the physical world. Emerging deep learning methods enable the integration and analysis of these complex data in order to address research and real-world problems by designing and discovering successful solutions. Our tutorial serves as a companion to our book, “Thinking with Deep Learning”. This book takes the position that the real power of deep learning is unleashed by thinking with deep learning to reformulate and solve problems traditional machine learning methods cannot address. These include fusing diverse data like text, images, tabular and network data into integrated and comprehensive “digital doubles” of the subjects and scenarios you want to model, the generation of promising recommendations, and the creation of AI assistants to radically augment an analyst or system’s intelligence. For scientists, social scientists, humanists, and other researchers who seek to understand their subjects more deeply, deep learned representations facilitate the opportunity to not only predict and simulate them but also to provide novel insights, associations, and understanding available for analysis and reuse. \n The tutorial will walk attendees through various non-nerual representations of social text, image and network data, and the various distance metrics we can use to measure between these representations. We then move on to introducing to neural models and their use in modern science and computing, with a focus on social sciences. After introducing neural architectures, we will explore how they are used with various multi-modal social data, and how their power can be unleashed with integrating and aligning these representations.
This tutorial will teach attendees about Active Inference as an agent-based modeling framework and its application to computational social science. Active Inference is an integration of neuroscience and cognitive science which builds a normative theory for biological and thus also social and cultural phenomena, with empirical validation at a neuronal level. Recent application topics as related to, e.g., economics, psychology and sociology, include multi-armed bandit models, cooperative action, approach-avoid behavior, and confirmation bias. Active Inference’s novelty lies in its integration of perception, “changing one’s mind,” with action, “changing the world,” via free energy minimization as a single cost function. This overcomes the ‘passive’ approach of recent generative AI (ex. LLMs) to learning and data generation and further allows agents to balance exploration (acting to seek information) with exploitation (acting to realize one’s preferences) as a novel approach to the exploration-exploitation social science debate. This framework can be adapted to both single- and multi-agent settings, and running simulations of these agents and viewing their modifiable and interpretable parameters allows for deriving insights about their actions, beliefs, and outcomes over time. After a primer on Active Inference, this tutorial teaches attendees how to copy and adapt an open-source Python script (which can be run within or downloaded from Google Colab in-browser) for their own experimental simulations which can be shared for experimental reproducibility and, depending on the experiment, fit to real empirical data.
The dark web remains mysterious, with many struggling to comprehend its nature. While some perceive it as a breeding ground for crime and injustice, others view it as an essential resource for individuals worldwide facing censorship, bigotry, and oppression. Opinions on the technology are divisive, yet it undeniably exists and is utilized by millions daily. Embraced by a spectrum of users, from cybercriminals to political dissidents to ordinary suburban parents, this platform offers a fascinating arena for studying diverse human behaviors and exploring the intricate intersections of core ethics and values. \n This workshop aims to provide participants with an understanding of the dark web—its functioning and how to access it. It will delve into the ethical and legal considerations surrounding the dark web, offering a rich terrain for research. The session will also showcase various techniques for conducting such research. By the end of the tutorial, participants will possess the knowledge and skills required to engage with the dark web as a platform for their own research endeavors.