Can news sentiment forecast macroeconomic data?

In recent years, researchers have explored sentiment from various media and its usefulness in predicting financial markets. However, macroeconomic forecasting has not yet been a focus. We show how forecasts of industrial production and consumer prices can be improved by applying a new method to incorporate emotions from newspaper articles.

Dr Markus Ebner
Head of Multi-Asset

Macro economic forecasts are typcially based on models that assume individual rationality and foresight. At the same time, behavioural research shows that emotions and narrative play a significant role in decision making and judgement.

To analyze the quality of macroeconomic forecasts through news, emotion, and narrative, we use data from the Global Database of Events, Language and Tone (GDELT). GDELT is a research collaboration that analyses global news articles and extracts elements such as themes, emotions, locations, and more. We use a machine learning-based filtering method to identify articles in the database that are relevant to the respective macroeconomic indices.

Our analysis shows that the emotions expressed in these news items add value to the prediction of industrial production and consumer prices in a variety of economies. The countries considered are the U.S., the U.K., Germany, Norway, Poland, Turkey, Japan, South Korea, Brazil, and Mexico.

The first step of our analysis is a simple keyword filter applied to the GDELT topics to filter out irrelevant information. The GDELT algorithm extracts themes from each news article and contains over 12,000 unique topics.In a second step, a Bi-LSTM neural network is trained on a set of 1,000 random articles, which are manually classified according to their relevance in terms of industrial production and consumer prices. The Bi-LSTM is then applied on the data resulting from the keyword filter. Finally, the filtered data is aggregated by location on a monthly basis.

“Incorporating a broad range of emotions into macroeconomic forecasts leads to better predictions.”

Dr Markus Ebner
Head of Multi-Asset


GDELT provides hundreds of sentiment scores for each news article. To extract the underlying tone from a news story, we
apply principal component analysis on these sentiment scores as a dimensionality reduction technique. This allows us to
identify correlations between data points and recognise patterns.
We then restrict the analysis to the first three derived principal components.

Overview of the components of the models and benchmarks
Source: Quoniam Asset Management
Description of the base line model

The baseline model represents a trend/seasonality approach, assuming that the development of macroeconomic factors is driven by their own past. To be more precise, we use an autoregressive model with a lag length of three months. To account for macroeconomic effects, we additionally include the Baltic Dry Index and crude oil price when modelling industrial production (IP).

Table 1: Consumer Price Index

Source: Refinitiv, GDELT, own calculations. Numbers represent the RMSE (%). Blue (red) cells denote cases in which the model outperforms (underperforms) the benchmark.
Numbers in parentheses correspond to the number of significant coefficients associated with GDELT factors in the model.
(*** denotes at least one GDELT sentiment factor with t-value < 0.01, ** < 0.05, * < 0.1)

Table 2: Industrial Production

Source: Refinitiv, GDELT, own calculations. Numbers represent the RMSE (%). Blue (red) cells denote cases in which the model outperforms (underperforms) the benchmark.
Numbers in parentheses correspond to the number of significant coefficients associated with GDELT factors in the model.
(*** denotes at least one GDELT sentiment factor with t-value < 0.01, ** < 0.05, * < 0.1)

For models forecasting consumer price indices (CPI), the countries‘ respective terms-of-trade indices and the crude oil price are included. This model setup is then enriched by the first three derived principal components from the GDELT sentiment scores.

Tables 1 and 2 show the results of the analysis. For each country, we first show the mean square error for model calculation. The following three columns show the mean square error for three different benchmarks.

As can be seen, the CPI models with filtered GDELT sentiment factors outperform BM1 for eight, BM2 for nine, and BM3 for even out of ten countries respectively. Moreover, nine out of ten models include one or more statistically significant GDELT sentiment factors.

The results for industrial production are similar, with superior results for eight of ten countries and at least one statistically significant GDELT sentiment factor for each country. The results suggest that the filtering methodology isolates relevant signals, as the models using filtered sentiment consistently perform better than the models using unfiltered sentiment.

Our analysis shows that including sentiment factors based on unstructured data improve macroeconomic forecasts. This is important for investors’ understanding of the current economic situation and its future development and, hence, builds the basis for their mid-term investment decisions.