Facebook Dataset to Improve Social Science
Published:
This blog was published in BlueSci - the Cambridge University Science Magazine. The original can be viewed here.
Social science research asks fascinating questions about human behaviour. However, it often suffers from a lack of adequate data, due to difficulties finding enough people to participate in experiments or complete questionnaires, and due to social desirability bias — the tendency to answer survey questions in ways that will be viewed favourably by others. A new observational, large dataset, shared by Gary King and Nathaniel Persily in February, may offer a solution to such problems. The dataset spans more than two years and summarizes information from 38 million URLs shared on Facebook, including whether the links were fact-checked, flagged or shared without viewing by users, as well as the types of users who interacted with these links.
Through the Social Science One initiative, researchers are invited to apply to access this unique dataset to study the effect of social media on elections and democracy. A common concern with open social media data is user privacy. King and Persily used the differential privacy approach, anonymizing data and introducing statistical noise and censoring to prevent re-identification of any individual represented in the data. It is likely that more datasets like this will be created and shared in the future, allowing social scientists to ask broader questions, and answer them in more naturalistic ways whilst preserving individuals’ privacy.