Are preseason college football polls any good?
The refrain sounds out every late summer as practices begin across college campuses nationwide: “college football preseason polls are worthless.” It’s an interesting question; how worthless are they?
To address this, I’ve compiled a data set of every team’s season-by-season record which includes polling data, head coach, and bowl result. This set will become the foundation for a great many statistical inquiries into college football. In the current context, the analysis is quite straight forward to answer 2 key questions:
#1: How accurate are pre-season polls compared to the post-season poll results?
#2: Can the number of wins be projected by polls? This analysis is easier to digest, and interesting as it touches on conference affiliation. SKIP TO SECTION 2 NOW
#3: Is there evidence that a particular conference (or a group of conferences) is perpetually overrated in the beginning of the season? SECTION 3 PUBLISHED WEEKEND OF AUGUST 25.
In this post, we’ll examine the first of these questions while leaving the latter question for parts 2 and 3.
So how do we go about testing this? Statistically it seems rather straight forward. The questions I want answered are:
- Do more highly ranked teams at the pre-season typically finish the season more highly ranked? (are pollsters good at identifying ‘good teams’)
- Do unranked teams tend to stay unranked by the end of the season? (are pollsters good at identifying ‘bad teams’)
- Is the pre-season rank an unbiased estimator for post-season rank? If AP says your favorite team is #15, on average is #15 an appropriate estimate for how that team will perform?
We test each of these in detail.
1. Teams That Start Ranked
We begin by looking at only teams that were ranked in the top 25 of the AP poll to start each season. How did they do? Were they reasonably close to their starting rank or was that rank way off?
Summary data of the past results over the last 20 seasons (1998-2017) are below:
From this simple table, we can immediately deduce that only 17.6% of the times (nearly 1 in 6) will a team finish within 10 ranks of its given pre-season rank. A team that starts the season ranked drops out of the polls entirely 38.4% of the time.
At a glance, the value of the polling appears dubious, indeed.
However, all is not what it seems. Given the percentages are so large and represent such a disproportionate number of teams, some further study into the Ranked->Unranked category is warranted. We start by simply looking at which teams are falling into unranked status by the end of the season:
Aha. The first evidence we have that pre-season polls actually mean something. There is a clear relationship between a team’s rank and the likelihood that team will finish the season unranked. The lower the rank (in terms of quality, not in terms of overall number ranking) the more likely that team will be unranked in the ultimate poll. In fact, looking at this relationship graphically, the strength of that relationship is quite robust:
In simple terms, this means that each rank position to begin the season is associated with a 2.5% higher chance of ending the season unranked. By the time we are into the #16 to #25 spots, it is more likely than a coin flip that a team will end the season dropping out of the polls.
I suppose this is a bit of bittersweet vindication for the pollsters: they appear able to assess which teams are most likely to be quite good and slot them at the top, but at the bottom end of the ranks, having a hit rate that is worse than a coin flip is not stellar.
2. Teams That Start Unranked
Moving into the determination about whether or not pollsters can at least identify the poorer quality teams, the evidence suggests that pollsters are in fact pretty good at this.
Here we determine the likelihoods of ending up ranked for either Power 5 conference members or Non P5 members:
This means given that a team starts the season unranked, it is overwhelmingly likely that this team will remain unranked by the end of the season. P5 teams have a slightly better chance of finishing ranked, but only marginally so. Taken together, pollsters get who is NOT amongst the best 25 teams in the country right 9 times out of 10.
3. Do Pre-Season Polls Have Predictive Power
We are approaching the crux of the issue. Generally, it seems pre-season polls do slot more talented teams more highly in the rankings, but teams in the teens (and lower) are basically a crap shoot. However, that is very far from saying that pre-season polls have any form of predictive power whatsoever on just how good a team will turn out to be.
To assess this, I invert the ranks such that being #1 in a poll is effectively worth 25 points. The idea of this is then that #25 is assigned a value slightly better (1) than being unranked (0).
Then, we conduct 2 sets of tests on 2 different sets of data.
The basic data sets are:
- All teams that began a season ranked or ended a season ranked
- Since by definition a lot of teams (50%+) fall out of the ranking, a narrower scope for consideration is to just look at teams that were ranked both at the beginning of a season AND at the end of a season.
We look at 2 simple models. One in which the typical regression formulation (slope + intercept) is considered, and an alternate where we model the end of the season rank as solely dependent on the pre-season rank. The former test tells us if there is any relationship between pre-season rank and post-season rank at all (e.g. better teams are ranked higher), and the second tells us whether or not the pre-season rank is a good predictor of the end-of-season rank. For example, if the polls rank a team #15 to start the season, we expect a huge margin of error, but does the overall data suggest on average that #15 was approximately correct?
So to summarize:
- Ordinary model: If > 0, pollsters aren’t total idiots and can do better than a monkey throwing darts at the newspaper, and the better the pre-season rank, the more likely the team is actually somewhat good.
- Slope Only: If = 1, pollsters average out to getting the right ranking over time and in fact are really, really good at their jobs in slotting teams in the pre-season.
As before, the 1998-2017 football seasons are considered:
The coefficient estimates for “Ordinary” are statistically different from 0, and for “Slope Only” are also statistically different from one.
The latter result is actually quite important, as it means that teams ranked in the initial poll statistically finish below their pre-season ranking.
Call it buying into the hype, the giving of the doubt to marginal teams, whatever you might want to call it, the evidence is quite clear that teams typically underperform their preseason rank, and this condition continues when I adjust to the data to ONLY look at teams that were ranked both at the beginning and end of seasons.
Putting It All Together
The conclusions are thus:
- Pollsters are better than random chance at identifying the quality of a team.
- They do a pretty good job at identifying the bad teams by not having them ranked.
- At a high level, the initial slotting correlates to how good of a season the team will have.
- However, the evidence is that teams in the initial poll are ‘overrated’ and fall in ranks by the end of the season
So, are polls worthless? Not entirely. At the top end of the ranking spectrum, one should presume that indeed this team should be among the best teams in the country at the end of the year. There is not a lot of evidence that being in the teens and twenties is of much value, however.
In part 2, we project the number of wins a team will have and begin to tie conference affiliation into the mix.