That TCS Poll: Or, why Cambridge’s Liberals may be saved by Science
May 1, 2015 § 5 Comments
OK, this will just be a short post to tell you guys about some number crunching I did lately. TheCambridgeStudent, one of Cambridge’s better student papers, did an election survey recently, with an analysis by Colm Murphy here:
The sample of over 700 students is a pretty meaty sample size, and the coverage is, compared to most of the mainstream media’s reporting on polling, actually extremely good. TCS wins a LOT of points for stressing that it is only one of a number of polls, and broadly speaking gives a fairly balanced coverage of their raw data.
However, it is those last two words that are the potential issue with directly interpreting the TCS results. The paper has very kindly agreed to give me their data (along with Phil Rodgers, who has done an interesting time-stamp analysis here: https://philrodgers.wordpress.com/2015/04/30/a-closer-look-at-the-tcs-student-election-survey/ ) – and as such I can show you exactly what I mean.
PART 1: Numbers!
There were 732 students in the election survey. The trouble is, if you select 732 people from a sample of a few thousand, it’s impossible to be sure immediately that your sample actually accurately represents the full picture, especially when there are a number of variables that could affect how people vote. This is an even worse issue in straw polls like this, which are self-selecting – there may be inherent skews in the data, not because people answered wrongly or dishonestly, but because some people are more likely to read and respond to polls than others. TCS didn’t collect a lot of the standard respondent metrics like income class and gender, but they did collect some data on subject, year, and college as well as people’s voting patterns. To show how this affects the data, I’m just going to use one metric – subject choice. The TCS analysis very effectively shows how much of a voting impact subject choice can have – historians heavily backed Labour, as one example, whereas NatScis backed the Lib Dems, and Mathmos tended to vote Conservative (and the Greens had the philosophers on their side).
We know roughly how many people do each subject at Cambridge university. The central university presumably has the exact figures, but I can’t find them so I’m working off admissions stats, which give the rough percentages. These will tell us what percentage of the sample “should” be doing each subject, to make it representative of the university as a whole. We can then compare these to the original numbers from the TCS survey, and find out by how much each subject was overrepresented or underrepresented and by how much. The answer, conveniently for this example, is that several subjects deviated significantly from their actual university percentages in the TCS sample. (I should note that I’m also assuming that differences in subject in the sample don’t reflect an actual difference in propensity to vote, and that the people from each subject within the sample would vote similarly to the ones outside the sample; both of these assumptions may be open to challenge, but it seems more inherently probable to me that there’s a subject difference in people who saw a survey that spread via social media rather than there being a large subject difference in voting propensity.)
So, what we then end up with is an analysis of how much extra “voting power” some subjects got compared to their actual percentage among students. The results for that in a few subjects are as follows, expressed in terms of how many times their correct voting power they had:
ASNAC – 2.7 times (highest)
History – 2.2 times
English – 1.6 times
Geography, HSPS – 1.4 times
Maths – 0.89 times (closest to balance)
Education – 0.86 times
NatSci – 0.84 times
Law – 0.78 times
Land Economy – 0.33 times
Engineering – 0.30 times (lowest)
What does all that mean? Essentially, History and English were over-sampled, meaning that they contributed much more to the TCS data, whereas NatSci and especially Engineering lost out – every engineer’s vote in the survey had to be multiplied by a factor of more than three to account for all the extra engineers in the university.
The obvious next question is “so what?”
PART 2: So This
We know that we oversampled englings and historians and undersampled engineers and scientists. The next step is quite simple; we multiply up or down people’s votes to correct for that!
And the transformation looks like this:
Which I freely admit looks underwhelming as hell. Nobody even overtook anyone! But the numbers are different, and the difference is crucial. With the weighting, having more Lib Dem friendly engineers and scientists halves Labour’s lead in the poll, from 12% to just under 6%. We can compare these with the two Ashcroft polls for Cambridge that have been done. On the later poll, better for the Lib Dems, Labour would (assuming other factors remained equal) need a 15% lead over the Lib Dems among students in order to win and counteract the Liberal lead in the city more widely. On the earlier poll last September, which was better for Labour, they needed just over a 6% lead. Comparing these numbers to the ones we’ve just calculated instantly shows why the weightings are important. A 12% lead is close to being large enough to overhaul a strong Lib Dem lead in the rest of the city – but one of only 6% gives CULC far more work to do in the last week of campaigning, and indicates that the science-side vote might keep Labour’s lead low enough to save Huppert even if Labour can cut his lead in the city itself back to where it was in September’s poll.
I’m not suggesting that all of that is necessarily verbatim proof; the TCS dataset lacks some of the information needed to weight it more fully, and there may well be inherent biases that we can’t pick up without that additional data. In addition, given it’s a straw poll, party campaigning may have given pushes to certain groups of voters and obscured the numbers. The above is, if anything, mostly just intended as a guide to why straw polls may not tell you exactly what the raw data appears to, why weighting is important, and how it can make a real difference to the result of elections; nonetheless, I suspect the broad finding that pro-Labour humanities students are oversampled and pro-Liberal scientists are undersampled in straw polling is significant, and may mean that Labour may not be quite as dominant among students as a whole as they have initially appeared to be from some recent student reporting.
Many thanks again to TCS for making this possible, especially Colm and Jack for dealing with my impertinent thirst for numbers to crunch.