Social media can reveal the psychological states of an entire population, according to new research.
The results show that through machine-learning—teaching a computer to identify and analyze patterns in large datasets—researchers can see, in principle, how a society is doing in real-time.
“These methods really show how to do psychological measurement in the 21st century in our digital world,” says Johannes Eichstaedt, an assistant professor of psychology at Stanford University and a faculty fellow at the Institute for Human-Centered Artificial Intelligence.
For the past decade, Eichstaedt has tested how to use social media, including Twitter, as a way to measure the well-being of a community. He contends that social media provides the largest data set on behavior, emotions, and thoughts in human history.
While the researchers acknowledge in the paper that Twitter is not representative of the US population, it can still provide insight into how people experience their everyday life.
“What we really care about is how well the population is doing in terms of psychological and physical health, rather than merely that the GDP is growing,” says Eichstaedt.
“You might not care about measuring subjective well-being in and of itself, but subjective well-being impacts mortality, including heart disease. It also impacts the economic bottom lines. So, it’s quite an important variable to capture for a population.”
From surveys to social media
To evaluate the different ways to analyze a region’s well-being, Eichstaedt and a team of researchers compared over a billion geo-tagged Tweets from 2009 to 2015 to 1.7 million responses from the Gallup-Sharecare Well-Being Index, an in-depth survey that measures how people experience everyday life.
Researchers have long relied on surveys like Gallup to measure a population’s well-being. While accurate, they can be costly and time-consuming undertakings. Sometimes it takes years to gather enough data for rough community estimates, says Eichstaedt.
But augmenting that process with data-driven techniques can alleviate some of that burden. Eichstaedt found that when researchers train an algorithm with both users’ responses to a written well-being survey and a sample of posts from social media from the same respondents, they can then deploy it on a much larger scale to predict how people from an entire region would have responded on a traditional survey based only on their Tweets.
Language is key
Before they used machine learning methods, researchers either picked words or asked raters to annotate words for how “positive” they are. It can be very tricky, however, to pick words that measure well-being, says Eichstaedt.
For example, the researchers found that internet slang such as “LOL”—the popular acronym for “laugh out loud”—and the words “good” and “love” were frequently used in areas with lower income and education (and, in general, lower well-being). So even though these might seem like positive words, they may not be, Eichstaedt says.
Similarly, Eichstaedt found that words like “homework” and “taxes” might seem negative out of context, but the researchers found that people with higher education and income—a group that other studies have found to typically have higher well-being—used these words more.
“When picking words to measure well-being, it’s really important to pay attention to cultural differences in language use across the US,” says Eichstaedt.
But machine learning methods can help determine which words are more important than others. When the algorithm compared a person’s social media posts against their survey responses, it learned that words like “LOL” are not reliable indicators of well-being and instead used words such as “fun” and “excited.”
“Having the computer learn the words may be the best way to find words that measure well-being,” Eichstaedt says. “Differences in language use can be quite complex.”
Health and well-being
The researchers note that well-being also associates with other important factors, including overall health. For example, stress can drive people to unhealthy behaviors—such as excessive drinking or smoking—that, in turn, negatively affect their health, he says.
“When people are suffering from depression and anxiety, we need to know so that we can ensure they have the resources they need,” says Eichstaedt, who is currently applying this method to study the impact of the novel coronavirus pandemic on the population of cities across the US.
“COVID-19 is a natural disaster that interrupts our social norms and routines at an unprecedented scale,” Eichstaedt says. “With this real-time Twitter-based technology, psychologists can monitor if loneliness and anxiety are taking hold in communities, and how our well-being is impacted by social distancing.
“There is no other data source that can provide such measurement at population scale and give estimates so quickly. Now more than ever, using robust machine learning methods is very important.”
The research appears in the Proceedings of the National Academy of Sciences.
Co-authors on the paper include Kokil Jaidka, affiliated with the National University of Singapore, Salvatore Giorgi and Lyle H. Ungar, affiliated with the University of Pennsylvania, H. Andrew Schwartz of Stony Brook University and Margaret L. Kern from the University of Melbourne. Support for this research was provided by a Nanyang Presidential Postdoctoral Award, Adobe Research Award, Robert Wood Johnson Foundation Pioneer Award and a Templeton Religion Trust Grant.
Source: Stanford University