Computational Analysis of text for APSA 2018

Proposal for a Research Cafe for APSA 2018: "Computational Analysis of Text and Big Data: Advances in Comparative Politics and Beyond".  Submitted to Comparative Politics (2nd preference Methods) on January 4th, 2018.

Recent advances in the computational analysis of text, together with the exponential rise of online, including social media data repositories, have opened up exciting possibilities for the rapid growth in data available to political scientists to answer old and new questions of interest to the discipline. In this round-table, participants will outline some of the ways in which they have managed to bring these new tools and possibilities to their own research, and will seek cross-pollination between the techniques they study, and the fields they represent. Human rights, protest, propaganda, elections, political violence, deliberations are among the areas that are being transformed by the creative appropriation of techniques developed by computer scientists and digital humanities scholars to retrieve information from large bodies of online text. We will especially aim to open up dialogues with scholars relying on more more traditional data gathering methods. We will ask how the new approaches fit into current empirical approaches in different fields in political science, and wether innovative solutions can be deployed in tandem with  traditional data gathering techniques to further research by expanding the quality and quantity of data available to scholars.

Nikolay Marinov, University of Mannheim

Marinov can speak about a number of applications of text analysis to issues in comparative politics and international relations.  One is a an original dataset of economic sanctions, and discussion of economic sanctions, generated by applying machine-learning methods to  U.S. Congressional documents and Presidential statements.  A related one is using a variety of techniques, including named-entity recognition, to understand when American policy-makers discuss other countries’ elections, leaders, human rights and commitments to international treaties.  A third application concerns how to link up all this information to information available on the web on countries’ elections, including information on who ran and on how competitive the election was.  The result is a new body of data, uniquely suited to answer questions of interest to political scientists of all stripes, including questions about American foreign policy, sources of influence on other countries’ human rights, and on the electoral outcomes we observe around the world.   

Anita Gohdes, University of Zurich

Gohdes can speak about using supervised machine-learning for text-classification for different types of data projects. An advantage of using supervised methods is that researchers can establish a clear codebook that is driven by theoretical concepts decided on a priori. For example, supervised ML can be used on qualitative accounts of individual instances of human rights violations to establish more fine grained measures of violence in contentious environments. Gohdes used supervised ML to classify 60 thousand individual records of fatalities in the Syrian conflict to establish whether individuals were killed in a targeted or indiscriminate way. In a different project, she and co-authors used a small hand-labelled training set of social media posts to classify all Twitter and Facebook posts shared by world leaders. While supervised methods have a lot of advantages, their performance is dependent on a number of important factors that will be subject of discussion in the research cafe.

Rochelle Terman, Stanford University

Rochelle can discuss her experiences applying computational tools and techniques to issues of culture, norms, and identity. These topics areas have historically relied on qualitative and/or critical methods, but recent advances in computational methods have provided new opportunities for engagement. Rochelle will discuss her usage of text-as-data methods, webscraping, and other techniques to examine American media coverage of women's rights and gender norms around the world. She will also discuss her experience as a data science instructor to students across the social sciences and humanities, who apply these techniques to a range of substantive topics using a variety of empirical and epistemological approaches.

Walter Mebane, University of Michigan

Mebane can speak about using Twitter to extract observations of election incidents by individuals across large elections.  Automated machine classification methods in an active learning framework have so far been used in the 2016 election in the United States (including primaries, caucuses and the general election) to classify Tweets for relevance and by type of election incident.  Even though humans use both text and images to decide how to label Tweets, the machine classifiers currently use text only.  Mebane will discuss ongoing work to build neural networks that use both text and images.  The project also uses a database of Tweet and user information to support analyzing the data.  For example, the user database is useful for filtering out both bots and users identified as bad actors created by Russia, as well as for developing attributes of individual users and of networks of users.  For the general election we develop from 16.5 million raw Tweets hundreds of thousands of incident observations that occur at varying rates in different states, that vary over time and by type and that depend on state election and demographic conditions.

Pamela Ban, Harvard

Pamela will discuss how she uses text-as-data methods on congressional text sources to shed new light on theories of congressional politics and organization.  Much of the existing empirical work on Congress revolves around using roll call voting data or Congressional Record speech data, which largely limits empirical analyses to the floor-voting stage.  Pamela will discuss how she uses new text datasets of committee speeches and committee reports to open up the black box of the congressional committee stage.  In particular, using these text sources, she constructs measures of disagreement during the committee stage and investigates how this disagreement affects committee decisions and subsequent floor voting.  She explains how incentives present in a strong committee system can lead legislators to deviate in their voting and contribute to bipartisanship.  More broadly, she will discuss how using text-as-data can help us understand deliberation processes in Congress.