What will the next President of the United States face? Topic modeling State of the Union addresses

In an election year, when considering what challenges the next president will inevitably face, it could be helpful to look at recent presidencies to see if there are consistent themes for a president’s concerns and goals. One rather succinct event where these themes are outlined is in the annual State of the Union addresses (often referred to as SOTUs).  I examined SOTUs through topic modeling from the end of the Cold War to today, a time period long enough for a significant corpus of speeches but short enough that the challenges for each president should be relatively comparable. I got the text from the SOTUs from the American Presidency Project at UC Santa Barbara, and included all SOTUs from 1990 through 2016 (excluding speeches that the project flagged as not “true” SOTU speeches at the beginning of a new administration, i.e. Clinton’s 1993 speech, Bush’s 2001 speech, and Obama’s 2009 speech) for a total of 24 speeches and over 8,000 words.


After setting up the corpus (I removed stoppage words, but kept low frequency words and, due to its minimal impact on word grouping but the extra difficulties in interpretation, I elected to not stem the words), I topic modeled the speeches, which is visually represented in the word cloud above.  The top 25 words were not surprising: words related to Americans as people, such as”people” (661 times), “America/American/Americans” (615, 437, and 338 times, respectively), “us” (528 times), “country” (314 times), and “children” (269) were the most common, as well as words related to the American character such as “can,” “new,” “every,” and “one” (591, 580, 412, and 356 times, respectively); other words related to jobs and the economy, including “work” (426 times) and “jobs” (281 times); domestic governance, with “Congress” (358 times);  foreign affairs or perhaps American exceptionalism, with “world” (402 times); and a heavy emphasis on time, with words such as “now” (485 times), “years/year,” (413 and 390 times, respectively), “tonight” (295 times), and “time” (295 times).  These top 25 words cover the high level view of the essentials for the position (see the bottom of this post for the complete list and the frequency).

Another interesting point from the top 25 words is that the word “help” is used so frequently, but it is unclear what is meant by that. For a closer look, I looked at the original corpus to look at the words in context. As might be expected, “help” is used in a wide variety of contexts, from foreign policy (“help Ukraine defend its democracy” (2016)) to philanthropy (“help one hungry child” (1991)) to the economy (“help one million young Americans work” (1996)) to generally inspirational (“help us reach that goal” (2005)).

While these word frequencies are interesting, as is evident by the examination of the word “help,” different words can mean vastly different things depending on the context – therefore, we need to consider a wider context using a Latent Dirichlet analysis model, or LDA. Returning to the original question, what are the consistent issues that a new president would have to face, by topic?  Of the top 25 topics, many include some variant of “America”, unsurprising since these are SOTUs. Several seem to be related to the economy and jobs (using terms like “work” and “job”), and several others seem to be related to national security and foreign policy, including the war in Iraq but not the war in Afghanistan (using terms like “weapon,” “nation,” “world,” “Iraq,” and “secure”).  The most common topic, according to the analysis I ran (it varies depending on the seed you set for the R analysis), is possibly associated with American exceptionalism, using words such as “America,” “world,” “us,” “must,” “new,” “one,” and “nation.” It was in the top three topics within a SOTU for the years 1990, 1992, 1994-5, 1997, 1999-00, 2004-6, and 2013.  How do the first and last speeches in the dataset compare?  In 1990, the top three topics President Bush discussed, in addition to the American exceptionalism topic, were the economy (“job,” “American,” “work”) and the Iraq war (“weapon,” “world,” “America,” “Saddam Hussein).  In 2016, President Obama was very focused on the economy, discussing a new economy as his primary topic (“American,” “work,” “change”) as his primary topic, followed by another version of the new economy (“new,” “America,” “work,” “job”), and then the same economy topic as President Bush discussed in 1990.  Apparently some things change, and some things stay the same.

To conclude, based on themes from the SOTU addresses since the Cold War, based on the most common topic, the new president will address American exceptionalism and, most likely, American jobs and the economy.  Not particularly surprising, but it is interesting to see it play out in the topic models.

For the R code and data associated with the topic modeling, see my GitHub portfolio.

*Top 25 words (frequency in parentheses):

  1. People (661)
  2. America (615)
  3. can (591)
  4. new (580)
  5. us (528)
  6. must (518)
  7. now (485)
  8. american (437)
  9. work (426)
  10. years (413)
  11. every (412)
  12. world (402)
  13. year (390)
  14. Americans (388)
  15. make (385)
  16. Congress (358)
  17. one (356)
  18. help (334)
  19. country (314)
  20. know (308)
  21. tonight (295)
  22. time (295)
  23. jobs (281)
  24. need (273)
  25. children (269)



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s