Tweeting the economy: Analyzing social media sentiments and macroeconomic indicators

This research aims to examine the correlation between social media sentiment and the Consumer Confidence Index (CCI) as well as Gross Domestic Product (GDP) in Indonesia. Data were collected through web scraping from Twitter (now also known as X) spanning from 2019 to 2022 on a monthly basis. Using Pearson and Kendall’s Tau correlation tests, the study found that the correlation between Twitter sentiment and the CCI is not significant. However, there is a significant correlation between Twitter sentiment from news accounts and GDP The findings indicate that the views and perceptions expressed in social media sentiment, particularly from news accounts on Twitter, could serve as an initial indicator of Indonesia's GDP.


Introduction
In recent years, social media has emerged as one of the most powerful and influential tools for conveying messages, facilitating communication, and sharing information (Chu & Chen, 2019;Chwialkowska, 2019;Walsh, 2020).Initially utilized as a means to interact with friends and family, social media has evolved into a significant platform for collecting the latest views, perceptions, and societal experiences (Gan et al., 2020;Jain et al., 2021;Lou et al., 2019).Notably, platforms like Twitter, now also recognized as X, enable individuals to swiftly and easily express their thoughts, views, and opinions (Thompson, 2018;Yarış & Aykol, 2022), fostering an environment for self-expression, sharing perspectives on issues, and engaging in online discussions.The inclusivity of social media allows diverse groups and individuals to participate, creating a level playing field for the aggregation of diverse views and opinions.Moreover, social media serves as a significant alternative indicator for measuring consumer confidence (Ilhamalimy et al., 2021;Tiep et al., 2021;Yu et al., 2021).Through these platforms, consumers can articulate their views on various aspects of the economy that impact their daily lives.Complaints and comments regarding goods prices, inflation, or other economic issues, often shared on social media, offer real-time insights into consumer satisfaction or dissatisfaction with economic conditions (Domalewska, 2021;Hirata & Matsuda, 2023).Conrad et al. (2019) discovered a relatively high correlation between the sentiment of tweets containing the word "jobs" and the Consumer Sentiment Index (CSI) during 2008-2009.However, the relationship began to decline after 2011, suggesting that such tweets may not have been as effective as expected as a surrogate for survey responses.Meanwhile, Shayaa et al. (2018) found a significant relationship between the Consumer Confidence Index (CCI) and social media sentiment on consumer

Behavior Economics
Behavioral Economics is a branch of economics that examines the impact of psychological factors on economic decisions made by individuals or groups (de Bruijn & Antonides, 2022;Vlaev et al., 2019).Notably, Kahneman & Tversky (1979), pivotal figures in Behavioral Economics, have proposed theories that offer valuable insights into consumer behavior.One of their key theories is Prospect Theory, which integrates two fundamental concepts: the nature of utility and rationality.Unlike normative economic theory, Prospect Theory elucidates how society behaves in the face of uncertainty (Zielonka & Szymanek, 2023).According to this theory, an individual's utility function follows an S-shaped pattern, also known as a concave utility function.This implies that consumers tend to be risk-averse when facing profits but become risk-seeking when facing losses.This pattern underscores that consumers are more responsive to changes in the value of their reference position than to changes in absolute values (Ayaa et al., 2022;Bougherara et al., 2021).
Furthermore, Kahneman & Tversky (1979) introduced the concept of framing, which asserts that the presentation of information can influence consumer decisions.This concept holds particular relevance in marketing and promotional strategies, where pricing and promotional framing can impact consumer perceptions of value and purchasing decisions.The theory highlights that individuals often make irrational economic decisions due to cognitive and emotional biases.For instance, individuals are more averse to losing than they are inclined to enjoy equivalent gains, a phenomenon known as loss aversion.Referring to Kahneman & Tversky (1979), it is plausible that the noise on social media, especially Twitter, can serve as a proxy for understanding consumer behavior, and this concept is referred to as sentiment analysis.

Sentiment Analysis
Sentiment analysis, an advanced computational technique extensively discussed by scholars such as Guenich et al. (2022) and Guo et al. (2023), intricately traverses a vast expanse of individual opinions, evaluations, attitudes, and emotions.Its application extends beyond the confines of diverse entities, encompassing products, services, and organizations.Within this expansive scope, sentiment analysis encompasses multifaceted concepts, including opinion gathering, opinion analysis, and influence analysis, each contributing to a nuanced understanding of subjective expression in digital discourse (López-Cabarcos et al., 2020;Xiang, 2022).Rooted in a combination of linguistic and computational approaches, this analytical method carefully dissects and categorizes sentiments, with a primary emphasis on discerning the polarity of these sentiments-both positive and negative.This approach, as delineated by López-Cabarcos et al. (2020) and Xiang (2022), yields invaluable insights into the intricacies of human sentiment within the digital communications landscape.Conrad et al. (2019) investigated the viability of utilizing social media as an alternative to survey data.They observed a notable correlation between the sentiment in tweets containing the word "job" and survey-based measures of consumer confidence.Another study by Shayaa et al. (2018) delved into the correlation between the Consumer Confidence Index (CCI) and social media sentiment regarding consumer purchases of two product types during the two-year period from 2015 to 2016.Shayaa et al. (2018) identified a significant relationship between CCI and social media sentiment analysis, demonstrating that social media can yield substantial data on consumer trust.In a different approach, Ortega-Bastida et al.
(2021) applied a multimodal approach to predict regional GDP using Twitter data.This method successfully delivers GDP estimates at a higher frequency than official statistics and provides robust quarterly predictions.

Sentiment Analysis and Economic Indicators in Indonesia
Indonesia stands to derive substantial benefits from leveraging sentiment analysis as a tool for measuring key economic indicators such as the Consumer Confidence Index (CCI) and Gross Domestic Product (GDP).This proposition is rooted in consumer behavior theory, supported by empirical evidence demonstrating that consumer sentiment-a critical element of economic activity-is effectively captured through sentiment analysis of social media.Consumer behavior theory posits that sentiments, perceptions, and attitudes significantly influence purchasing decisions, thereby impacting economic outcomes.Social media, as a reflection of public discourse, serves as a rich, real-time data source summarizing these sentiments.Empirical evidence further supports the idea that fluctuations in consumer sentiment, as expressed on platforms like Twitter, are correlated with changes in economic indicators (Conrad et al., 2019;Shayaa et al., 2018).
In the distinctive context of Indonesia-a vast archipelagic nation-traditional methods of obtaining economic data may present challenges, resulting in data gaps and reduced accuracy.The decentralized nature of the country poses logistical challenges for timely and comprehensive data collection (Andiojaya et al., 2022).However, sentiment analysis provides a solution by offering a rapid and diverse understanding of public sentiment regarding economic conditions.This approach not only circumvents the limitations of traditional data collection methods but also enables a more dynamic and responsive assessment of economic indicators.By embracing sentiment analysis, researchers and authorities in Indonesia can overcome obstacles associated with obtaining accurate and timely economic data.The availability of social media data allows for quicker responses to emerging trends and sentiments, fostering a more agile and informed approach to economic analysis and policymaking.In essence, sentiment analysis is emerging as a valuable tool for enhancing the accuracy, speed, and comprehensiveness of measuring economic indicators in a geographically diverse and expansive country like Indonesia.

Research Method
The initial phase of the research procedure involves data collection.CCI and GDP data were sourced from the Bank Indonesia website.For tweet data, we utilized the "snscrape" and "tweepy" libraries in Python to scrape Twitter data, covering the period from January 2019 to December 2022.The selection of keywords for scraping aligns with the Indonesian bank consumer survey (refer to Table 1).Due to limited computing resources, the authors constrained data collection to 1,000 tweets per month per keyword.This limitation resulted in the collection of a total of 1,220,992 tweets in this first step.A comprehensive review of the collected tweet data was conducted to ascertain whether the tweet text contains the specified keywords.The data cleansing process aimed to ensure that the harvested keyword data is associated with legitimate users.Following data cleaning, 1,131,401 clean tweets were retained, meeting the specified criteria.

Data Labelling
In this stage, the labeled data is categorized into two groups: positive and negative.Positively labeled tweets convey positive thoughts or impressions towards specific terms, while negatively labeled tweets express negative ideas or criticism.The Lexicon-Based method, implemented with the "Literature" library in Python, is employed for this labeling process.After assigning a sentiment label to each tweet, researchers further categorized the tweets based on the account owner into three groups: All Accounts, News Accounts, and Non-News Accounts.The following is an explanation of these categories: a) All Accounts, category This is the original category that before tweets were grouped based on their account owner.All tweets in the dataset are included in this category automatically; b) News Accounts, this category includes tweets whose account owners are news media companies that have been registered and verified by the Indonesian press council.Using the "Lookup" function in Excel, researchers carried out a matching process with the database from the Indonesian press council; c) Non-News Accounts, this category includes tweets from accounts that do not correspond to news media companies from the Indonesian press council database (refer to Figure 1

Normality test
Before conducting the correlation test, a normality test is imperative to ensure that the data follows a normal distribution, serving as a prerequisite for subsequent parametric statistical tests.The Kolmogorov-Smirnov (K-S) method, executed through R-Studio software, is employed for this normality test.The K-S test, a non-parametric test, compares a sample distribution with a theoretical distribution or compares the distribution of two samples.The formula for the K-S test is as follows (equation 4), where  0 () represents the theoretical cumulative distribution function, n is the number of observations, and xi is the ith observation data that has been sorted.The null hypothesis of this test posits that the data conforms to a normal distribution.If the p-value resulting from the K-S test is smaller than the specified significance level of 0.05, then the null hypothesis is rejected, signifying that the data does not follow a normal distribution.(5) In addition to the Pearson product-moment correlation coefficient, researchers employ Kendall's Tau (Kendall, 1938) to conduct a robustness test (refer to equation 6).If  > 0 , the variables are positively correlated, and if  < 0, the variables are negatively correlated.This method provides an alternative approach for assessing the correlation between variables and is particularly robust in capturing monotonic relationships, even in the presence of outliers or non-normally distributed data.

Result and Discussion
The analysis of Table 2 reveals a prevalence of positive sentiment tweets over negative tweets based on the sentiment analysis process conducted on the dataset.Given that the Consumer Confidence Index (CCI) demonstrates a non-normal distribution according to the Kolmogorov-Smirnov test (see Table 3), the researchers opted to perform a normality transformation of the data using natural logarithms, following the methodology outlined by Lee (2020).In this research, despite the primary focus being on measuring correlation, the normality test remains crucial.This is because data normality is a fundamental assumption that must be satisfied when applying Pearson's correlation coefficient, as emphasized by Khosravi et al. (2023) and van den Heuvel & Zhan (2022).4 and 5 reveal that Twitter sentiment fails to adequately capture the Consumer Confidence Index (CCI) in Indonesia.This observation aligns with several studies demonstrating that while social media is pivotal in representing public opinion, it does not consistently exhibit a directly proportional impact on consumer confidence (Abbas et al., 2022;Li et al., 2022;Liu et al., 2023;Primananda et al., 2022).It is noteworthy, however, that sentiment originating from news accounts demonstrates a higher correlation compared to sentiment from other accounts.This underscores the notion that opinions expressed through news accounts tend to be more reliable (Ortega-Bastida et al., 2021;Sampietro & Salmerón, 2021).Furthermore, our results indicate a significant and positive relationship between sentiment on Twitter, particularly from news accounts, and Gross Domestic Product (GDP).Conversely, sentiment from non-news accounts does not exhibit a significant relationship with GDP.This finding introduces a novel perspective and evidence compared to studies by Abbas et al. (2022), Jabeen et al. (2022), and Nia et al. (2022), suggesting that a positive relationship between Twitter sentiment and GDP is more likely when sentiment from news accounts is predominant.These findings underscore the pivotal role of social media, particularly Twitter, as a platform for capturing public opinion that can be linked to macroeconomic indicators.This opens avenues for economic and business policymakers to leverage social media as an analytical tool in decision-making processes.
The results described here present several advantages and significant deviations from prior studies.In this comparative analysis, we emphasize several key points.This research is specifically tailored to Indonesia, a context characterized by unique social, economic, and social media features, thereby contributing valuable insights into the impact of social media sentiment on economic indicators, particularly from the perspective of developing countries.While Conrad et al. (2019) focused their research on the United States with limited keywords, we identified sentiment by involving 28 keywords reflective of consumer survey topics employed by the central bank, ensuring the collected data remains pertinent to critical issues within the Indonesian economic context.Additionally, this study distinguishes between news accounts and non-news accounts on Twitter, providing deeper insights into sentiment analysis-a differentiation absents in the works of Conrad et al. (2019), Shayaa et al. (2018), andOrtega-Bastida et al. (2021).
Moreover, these findings underscore the intricate interactions between social media, news, CCI, and GDP.Positive or negative sentiments posted via news accounts may set off a domino effect in shaping consumer perceptions and actions.Twitter news accounts, recognized as reliable sources of information, can significantly influence consumer perceptions of the economy.Consequently, sentiment emanating from news sources may impact consumer behavior, thereby influencing overall economic activity.Thus, these findings not only affirm the potential of social media in capturing public opinion but also highlight the likelihood of social media influencing consumer behavior.As a result, the development of a lag model becomes imperative for future research, particularly to substantiate the hypothesis that social media can reshape consumer behavior.

Conclusions, suggestions and limitations
Our examination of Twitter sentiment, the Consumer Confidence Index (CCI), and Gross Domestic Product (GDP) in Indonesia provides noteworthy findings.Twitter sentiment, particularly from news accounts, exhibits a robust predictive relationship with GDP, underscoring its potential as an analytical tool for early economic indicators.However, its effectiveness in discerning variations in the CCI is limited, aligning with previous research highlighting the nuanced role of social media in reflecting public opinion and its inconsistent influence on consumer confidence.The outcomes of this study offer valuable insights for policymakers, providing a basis for informed economic analysis and decision-making processes.The study suggests that understanding the distinct impact of social media sentiment from news accounts can contribute to more accurate predictions of economic trends.
Looking ahead, we advocate for future research to explore lag models, offering a more comprehensive assessment of social media's potential in reshaping consumer behavior over time.This could unveil deeper patterns and dynamics in the relationship between social media sentiment and economic indicators.In summary, our findings carry implications for both academia and practical policymaking, contributing to a richer understanding and application of social media in economic analyses.
for detail).The aggregation of tweet sentiment is measured using three methods, as referred to byO'Connor et al., (2010), with details following equations (1)-(3).Sentiment calculations based on daily tweet data are then transformed into monthly and quarterly sentiment data through arithmetic addition.The determination of this period is adjusted to CCI and GDP data published monthly and quarterly.where P is positive sentiment and N is the negative sentiment

Table 1 .
Indicators and keywords

Table 2 .
Results of sentiment analysis

Table 4 .
Results of Pearson correlation test

Table Result of
Kendall's Tau correlation test