Two weeks after voters in the United States cast their ballots for the 2018 midterm elections, we find ourselves in the exciting period that is post-election reflection and analysis, whose greatest thrill and challenge is to condense the results of 435 races in the House of Representatives, 50 in the Senate, and 36 for Governorships into meaningful insights. There are an overwhelming number of records-broken, historic firsts, and 2020 implications to dissect as the final tallies trickle in, but some of the most surprising takeaways from Tuesday night have little to do with the election results themselves. In the first nation-wide election since Donald Trump won the presidency in 2016, this round of midterms was obviously colored by lessons-learned from two years ago, and nowhere more noticeably than in the context of pre-election polling.
Few of us who followed the live coverage of 2016’s election night have forgotten the experience of catapulting along an emotional roller-coaster, because for the vast majority of the 49% of the country that voted for Hillary Clinton (plus many of those who voted for Trump, Gary Johnson, or Jill Stein), the night played out very differently than expected. As the recognition of turning tides hit, viewers watched in real time as pundits’ predictions of Hillary’s “landslide victory,” and “likely blowout,” were quickly replaced by dazed (or gleeful, depending on what channel you were watching) takes on a single question: how could the polls have possibly been so wrong? With an international audience still reeling from the unforeseen results of Brexit a few months prior, vanishing public trust in polling’s accuracy added fodder to a larger, global assault on the “mainstream media” and expert credibility.
With the advantage of two years of hindsight, we now have an answer to the question of how polls led us astray: they didn’t. Rather, polling accuracy at the national level in 2016 was comfortably in line with historical trends since 1968. Yet even now, despite intensive coverage on the matter from polling experts like FiveThirtyEight’s Nate Silver, respected think tanks, and major news organizations, it’s been hard to shake the feeling – one which I, for one, jumped to instinctually, and which was widely echoed in the coverage of the weeks immediately following the election – that “the data failed us,” or that “public-opinion surveys and election forecasters misread the outcome” in a systematic and devastating way.
None of this is to say that polling was perfect in 2016 – sampling issues meant that important segments of the population were overlooked, and declining response rates will continue to pose a challenge in the days of caller ID. The reality is that polls should always be taken with a massive grain of salt, and this was the all-too-important ingredient that was missing in 2016. Single-number simplifications like win-probabilities and those needle-esque graphics often seen in the leadup to an election belie the complexities of their underlying models, which should, according to our enlightened 2018 perspective, incorporate a range of factors such as historical trends, political polarization, economic performance, and expert opinions on things like expected turnout in addition to pre-election polls in order to bolster their accuracy and robustness. When the status quo is to gloss over these annoying technicalities, it becomes impossible to think critically about the choices that news outlets are making in terms of how much weight to afford to each of these factors, or to red-flag models which rely entirely on one factor to the exclusion of the rest. When that happens, the single factor is almost always generic polling (“what party will you vote for on election day?”), since that remains the best isolated predictor of overall results in a legislative election.
We can see the benefits of taking multi-factor models into consideration by analyzing how 3 pre-midterm prediction models produced by FiveThirtyEight fared. The forecast page allows the viewer to select between a “lite” model based solely on polling data, a “classic” model with polls and additional characteristics of each district, and a “deluxe” model which included expert ratings. Polls started in August of 2018 and the models were updated on a daily basis. The table below summarizes the average accuracy of these models in predicting the winners of 417 House elections (18 were uncontested) and breaks them down according to the New York Times’ classification of narrow and tossup races.
In the case of pure win-probabilities, the deluxe model produced the highest chances for winning candidates in 65% of the elections (section 1. in the table) and the highest average win-probabilities (2.) including in narrow and tossup races. The classic model performed slightly better overall than the deluxe in predicting vote share (3.), but not for tossup and narrow races.
Drilling down to the district level, it’s clear that in many cases where the classic or lite models out-performed the deluxe, the difference in win-probabilities and margins of error were immaterial. Texas’s 13th district (4.) was a highly-watched race given Beto O’Rourke’s impressive challenge to Ted Cruz in a traditionally blood-red area, yet all three models put Cruz’s chances of winning within .001 percentage points of 100%. There were, however, a few exceptions in which the lite or classic models did discernably better than the deluxe, including Iowa’s 4th (5.) and California’s 45th (6.) districts. Iowan incumbent Steve King faced fallout on his pro-gun stance after the Pittsburg shootings and accusations of racist rhetoric prior to the election, leading to a poor fundraising cycle and public efforts by Republican members of Congress to distance themselves from his campaign. With King’s opponent highlighting his centrist position on gun control, these factors likely contributed to King’s dampened win forecasts in the classic and deluxe models. California’s 45th district was one of the midterms’ closest races, in large part due to uncertainty regarding how voters in a rapidly diversifying but traditionally moderate-conservative area would react to incumbent Mimi Walters’ votes to repeal Obamacare and eliminate state and local income tax deduction. Democratic challenger Katie Porter won with 51.2% of the vote.
FiveThirtyEight’s approach to forecasting is often considered a gold-standard, and the main takeaway from this overview of its 3 models should not be that one model is better than the others, but that there is substantial value-added in questioning the factors that drive the differences between them. Within the broader context of the ongoing autopsy 2016’s lessons-learned, pollsters and members of the media alike deserve healthy praise for their efforts to demystify and properly qualify polling in the leadup to the 2018 midterms. The New York Times tracked polling live from first to last voter polled, showing how its dynamic ebbs and flows changed their predictions in real time. The Economist published a comprehensive methodological breakdown of their predictive model before the start of the campaign season and referenced it in all subsequent predictions. Almost every news organization has shifted away from reporting single-number win-probabilities in favor of showing each candidates’ vote share predictions, and terms that were previously deemed statistical mumbo-jumbo like “margin of error” and “confidence intervals” are now the norm in print and televised coverage alike. Disclaimers about polling fallibility are now afforded about as much white-space and air-time as the predictions themselves. Perhaps these measures were driven by a desire to safeguard against the type of backlash we saw in 2016 more so than by some altruistic sense of public responsibility, and perhaps those who aim to sow doubt in the institutions of the press will jump on these qualifications and twist them into some admission of guilt, but the net result, I think, has been positive.
If the uncertainty surrounding polling and predictive models is now explicitly clear, one might reasonably wonder why anyone bothers with them at all, given how critical they have been in determining candidate strategy and individual behavior whether they hit the mark or not. It’s been suggested that the (incorrect) popular wisdom that Hispanic voters decided the election in 2012 for Obama – one based off of early exit polling data – was a massive contributing factor in bipartisan support for immigration reform immediately following that election. In addition to potentially costing her Michigan, I suspect that inflated predictions of Hillary’s margins in traditionally-Democratic strongholds gave some voters more confidence in casting protest votes for third-party candidates. This would suggest a sort of reverse-bandwagon effect, a subset of the “wisdom of crowds” phenomenon which is equally concerning in its ability, according to a 2014 randomized control trial by Rothschild and Malhotra, to influence voter outcomes. Evidence that polls were inducing conformity in 2012 prompted Nate Silver to muse that he might stop blogging if such trends continued.
Despite these concerns, polling still offers the best measure of public opinion available to policy-makers and until there is an alternative, they’re not going away. Diversified modeling and vastly-improved coverage of them in this year’s midterms allowed candidates to effectively pivot their strategies in tandem with changes in public opinion on issues like gun control following the Pittsburg shooting. Given social media’s incredible pulse-taking capabilities, the increasing popularity of town halls and advisory questions or non-binding referenda, and the relatively small and decreasing cost of online polling, the case can certainly be made that candidates and elected officials should diversify their policy-gauging toolbox and increase their skepticism towards polling in the same vein as media organizations. But these options are far from perfect – such referenda are far from the norm, my Nina goes to every town hall but will likely never join Facebook, and the online profiles of actual citizens are currently indistinguishable from propaganda bots. Until these challenges are addressed, old-fashioned telephone polling and long-form paper surveys will remain the most representative and trustworthy mechanisms through which politicians can facilitate the democratic process – grains of salt and all.
Featured image source: https://static01.nyt.com/images/2018/03/12/upshot/up-needle-image/up-needle-image-jumbo.png?quality=90&auto=webp
Alaina Rhee is a first-year Master of Public Administration student at the LSE. Alaina previously studied Economics and International Relations at the University of Virginia (2014) and worked as a research analyst at the International Monetary Fund, where she focused on inequality and migration.