Sentieo FedSpeak Lexicon

Sentieo Data Science


Reading The Tea Leaves From Federal Reserve Statements

When economists talk about inflation, they may describe themselves as “hawkish,” – in favor of policies which combat inflation – or “dovish,” that is, less concerned about inflation pressures in the economy.

This nomenclature has migrated into the often jingoistic discussions around US Central Bank interest rate policy. In this world, “hawkish” refers to the Federal Reserve’s inclination to raise the overnight borrowing rate, and “dovish” conversely reflects a tendency to lower rates or leave rates unchanged.

The Federal Reserve sets monetary policy via the Federal Funds Rate, which is the rate of interbank lending of excess reserves. This interbank lending rate passes into the economy through Fed member banks who pass the higher input cost of money on to their customers in turn.

In effect, the Federal Reserve controls the price of money in the US economy. Therefore, figuring out which way the Fed is leaning in terms of “hawkishness” or “dovishness” is of great interest to money market participants.

Natural Language Processing

One of our core competencies here at Sentieo is Natural Language Processing. What NLP allows us to do is build predictive models from various sets of document data. We might be working with SEC documents for a specific company to extract company-specific key performance indicators, we might be chaining financial tables together over time for spreadsheet models, we might train models to extract guidance statements from company press releases, or classify research reports by type. In all cases, we’re using machine learning and deep learning for predictive analytics on a “corpus” of documents.

As an exercise, we took a similar approach to Federal Reserve Meeting Minutes and then applied what we learned from this modeling to Fed Statements.

The Data

The Federal Open Market Committee (FOMC) meets on a regular basis several times a year to discuss the state of the US economy and decide on where to set the level of short term interest rates. These Meeting Minutes and Statements are published at

As background, the Statements come out coincidentally with the Fed’s market action, usually around 2:15 on the afternoon of the last day of the meeting. The Minutes are then published a couple of weeks later.

Market participants have very little time (because the market is open when the Fed releases its Statement) and very little data to work with from an empirical perspective (because the Statements are a lot shorter and less descriptive than the Minutes) when the Statements come out.

For these reasons, parsing Fed Statements has become an industry unto itself.


We scraped the documents in two sets: the Meeting Minutes and Meeting Statements. We used the Meeting Minutes to train a machine learning classification model for Meetings where the Fed raised rates and where they lowered rates. We used the classifiers from that model to then create a “FedSpeak” lexicon which we applied to the Meeting Statements in an effort to measure the relative “hawkishness” of the Statements dataset over time. We used this mixed approach (machine learning and lexicon) to facilitate sense-making over the multi-year interregnum period when the Fed left rates at zero.

We trained our model on the Meeting Minutes because these are longer files with more data about the FOMC deliberations. We assembled the Minutes into a dataframe arranged by date, and further split the data into sentences and then filtered the dataframe to remove non-meeting-related text (description of open market operations, list of attendees, etc.).

Prior to modeling, we took a look at the data by simple word frequency. For visualization purposes, we classified each meeting as a “hike” or a “cut” meeting, and then arranged the most common words in each type of meeting.

After performing this tokenizing step (tokenizing means splitting the text into individual words) and then additionally creating a sparse matrix for use in our machine learning model, we had 3,412 observations and 3,347 features in the Fed Minutes matrix ready for processing.

We then joined our “hike” or “cut” classification variable to the Fed Minutes to act as the response variable for prediction.

In effect, we sought to determine the “hike” probability based on two classes of Fed Minutes: those where the Fed raised rates and those where the Fed lowered rates.

(Why not use Meeting Minutes from the long interregnum period where rates stayed at zero through the period after 2008? We tried. We used 1 month T Bill rates, Libor, and 3 month T Bill rates in a multivariate logistic regression model and found that since rates rarely moved when the Minutes came out we were then training a model to predict nothing from nothing, and found that our overall probability of successfully predicting rate hike language became very small as a result. We will probably revisit this issue in future posts, and it’s a goal of ours to include the interregnum against some form of significant, exogenous classifier, perhaps the Ted spread, etc.)

Given our final matrix of input data, we then trained our classification model using the R package “glmnet” to fit a logistic regression model with LASSO regularization.

Importantly, the variable selection that LASSO regularization performs allowed us to determine which words were most important for prediction.

We then used the sort order of importance to build our FedSpeak Lexicon. Words with positive coefficients predicting “hike” were termed “hawkish.” Words with negative coefficients were termed “dovish.”


As an example, we show the top 10 hawkish and dovish words in the model.

Model Performance

As a way of visualizing how our predictive model performed, we’ve included a chart showing prediction results: the % probability of “hike” for each of the Fed Minutes we used in our classification model.

The forest green box plots are Fed Minutes where the Fed raised rates. The blue box plots are Minutes where the Fed lowered rates. As expected, in cases where the Fed raised rates, our model predicted the same, and in cases where the Fed lowered rates, the probability of “hike” is very low.

The black dots are outliers. The size of the boxes gives a sense of their variance. The bottom of the box is the lower quartile of the data, the top of the box is the upper quartile, and the line inside the box is the median of the dataset.

Results Applied to Fed Statements

Lastly, we turned away from the Fed Minutes, and loaded our dataframe of the history of the Fed’s Statements from 2008 onwards.

We applied the coefficient-driven FedSpeak Lexicon to each Statement to get a relative sense of each Statement’s “hawkishness.”

Candidly, we like the Lexicon approach to Statements because it is fast, simple, easy to explain, and easy to visualize. We could have directly applied the trained Minutes model on the Statements and we do intend to explore this option more fully in future work.

Having said this, the Lexicon approach shows the Fed has clearly transitioned from “dovish” in the 2008 period to “hawkish” in 2018, with a downtick in “hawkishness” in the most recent meeting.

We believe that most market participants would agree the Fed took a less aggressive position vis a vis rates into year end 2018.

Additionally, a review of the Federal Reserve’s “dot plot” of forward rate expectations from the December meeting vs the June meeting can be reviewed on page 3 of each of the following documents:
December: June:

The “dot plot” or forward rate curve has declined significantly since June of last year.

For reference, we’ve added a LOESS regression line as a sort of smoothed rolling average to offer a sense of trend over time.

For further information and a free trial of our research platform software, please visit us at

Special thanks to Julia Silge for opening the door to tidytext and machine learning in R:

Leave a Reply

Your email address will not be published. Required fields are marked *