This week, a handful of university research teams will have access to a new Facebook tool designed to aggregate near-universal data in real time on the world’s largest social network.
When it comes to who has access to Facebook’s data and how, the company now known as Meta is still feeling the fallout from the Cambridge Analytica scandal of 2018, in which a political consulting firm collected personal data from millions of ignorant Facebook users to create detailed profiles on potential voters. The company shut down thousands of APIs over the next three years and is only just beginning to restore broad access for academic research.
TechCrunch previewed Facebook’s new Academic Search API and spoke with Facebook Product Manager Kiran Jagadeesh, who led the project with the Facebook Open research and transparency (STRONG) team.
“This is just the start,” Jagadeesh told TechCrunch, calling the Researcher API a beta of the toolkit it hopes to eventually deliver. The API, first announced at F8 this year, is Python-based and runs in JupyterLab, an open-source notebook interface
In light of Facebook’s many past privacy concerns, the new Researcher API comes with a few initial caveats. First, the API will only be made available to a small group of established academic researchers through an invitation-only system. The company plans to expand access beyond the initial test group in February 2022, incorporating feedback from the test into a wider launch to all academics.
Another precaution: the Researcher API operates in a tightly controlled environment that Jagadeesh described as a “digital clean room”. University researchers with access to the API can enter the environment through a Facebook VPN, collect data and analyze numbers, but the raw data cannot be exported, only the analysis.
The idea is to protect user privacy and prevent analyzed data from being re-identified, but the limitation could rub some of the company’s criticism the wrong way since all of the public data collected by the API researcher are already there, but difficult to aggregate and analyze with existing Facebook tools.
At launch, the API will provide access to four real-time Facebook datasets: Pages, Groups, Events, and Posts. In each case, the tool will only pull public data and only sources located in the US and the EU initially. For Groups and Pages, at least one administrator will need to be located in a supported country for this data to be made available through the API.
With this tool, researchers can analyze large swathes of plain text using methodologies such as sentiment analysis, which tracks the valence and emotions that people express through their speech on a given topic. Beyond textual publications which include most of the available data, researchers can also access related information such as descriptions of groups and pages, their creation dates, and reactions to the publications.
Multimedia data such as raw images will not be included, nor comments or user demographics (age, gender, etc.). The API will also not collect data from Instagram, although Jagadeesh acknowledges that the platform is very valuable to researchers and that the team is exploring ways to make Instagram data available.
The FORT team hopes to work closely with university researchers to develop and develop the current tools, which Jagadeesh describes as a work in progress. While Meta said its initial group of academic partners is yet to be defined, the company has invited researchers from 23 academic institutions around the world to launch the tires.
Researchers who completed the team’s onboarding process and accepted its privacy policies were granted access on Monday, November 15. Facebook requires that anyone accessing research accepts privacy constraints, including not re-identifying specific people in the data.
The Research API is currently only available to a handful of academic institutions, but the FORT team plans to explore granting access to other groups, including journalists. The goal is to create a public roadmap that gives researchers and journalists a transparent overview of what the team is working on.
The company has a lot to do to build trust in the research community. In August, Facebook cut off access to advertising data to two prominent researchers affiliated with NYU’s Cybersecurity for Democracy Project, prompting reprimands from many academics and regulators. These researchers focused on tracking disinformation and political advertising through a Activation browser tool called Ad Observer. In September, Facebook apologized to an elite group of researchers known as Social Science One for providing them with incomplete data – a mistake that undermined months of work and analysis.