Thursday, August 07, 2008

Roller-coaster polling

Keep your barf bags handy

I like to follow political polls and I'm encouraged that my preferred presidential candidate continues in most reports to hang on to a significant lead. Nevertheless, I am heartily tired of the constant “analysis” of nonexistent volatility. It's a pseudo-drama ginned up by a combination of deep ignorance and ratings-driven sensation mongering.

It's polling, suckers. It's a statistical art with well-documented error bars. The results are not five-decimal-place scientific calculations. They're probabilistic estimates. The numbers are going to jump up and down even if nothing changes. Got that?

For purposes of illustration, suppose that a poll's accuracy is described as being within 3 percentage points 90% of the time. Got that? Then about 10% of the time it will be off by more than 3 points. If two candidates are within about 5 points of each other, taking 3 points away from the leader and giving them to the other guy will reverse the race! Only not really, because the poll would be in error in that instance. These flukes are not avoidable and they provide grist for the mills of the talking heads who will then scream about upsets and stunning surprises. They're just idiots. Or ignoramuses. Maybe both.

Let's take the illustration a few steps further. We can use a normal distribution (“bell curve”) to model the results of polls whose errors are less than 3 points 90% of the time. With the assistance of Excel's random number generator and built-in normal distribution functions, I ran two dozen trials and plotted the results of the simulated polling errors:

The model calls for the errors to fall within plus-or-minus 3 points 90% of the time (in the long run, mind you). Ten percent of 24 trials is 2.4, so we should expect maybe 2 or 3 results to fall outside the plus-or-minus 3 band. In this little experiment, it happened four times, twice going high and twice low. Please note the dashed line marking the −2.5 boundary. Whenever the leader's margin was underestimated by at least 2.5 points (in a race with an actual 5-point margin), the lead changes hands—according to the poll, not reality. Imagine the breathless reports that would immediately blossom on the cable news networks, newspapers, radio talk shows, and blogs. Panic! Or elation!

Another thing to keep in mind: The model did the exact same randomized calculation each time, assuming no actual change in the electorate. You still get variation, an inevitable consequence of sample-based statistics. You're stuck with it and you have to live with it. And pundits love to live with it because it gives the illusion of motion even when none exists.

People should just calm down.

I swear that I did not cherry-pick my results in order to present one that looked especially dramatic. The graph above shows my initial run. I cranked out some more examples, just to see what they looked like. These appear, together with the original experiment, in the graph below. Does it look chaotic enough for you? (Only the blue graph failed to produce an artificial swap in the lead.) See? Major developments in the campaign! We have headlines!


unapologetic said...

Idea: plot the result including the distribution. Of course, most people can't read a probability distribution worth a damn, so here's what I'm thinking...

Don't plot the curve. Plot a vertical bar shaded more or less densely according to the probability density function. Then for each [time-unit] of your tracking poll, plot a bar graph, but use these "density bars" instead of a simple height. Even better would be if you can smooth out the jumps at the edges of the bars.

Of course I can't program to save my life, so I don't have any examples of this... maybe a bleg...

Sili said...

I hate that!

They do it here, too. Even outside of election time. And don't get me started on referenda.

I wish I knew what to do to get a ban on polling. At least in the run-up to elections. Including bloody exitpolls.

/rant /rage

Proud2bHumble said...

Would peckers account for holes in telephone polls?
Cuz if you pick a pack of polling peckers,
you'll probably find their fingers free
nearly never nine to nine,
when paleold voters dine in their homes at scheduled times,
while the working poor
ain't rolled in the door
from their second job at the store,
and the youngsters don't mind
not bein' landlined...
So the peckers
hardly penetrate,
so demoscriminate...
Just like a buncha peckers,
always skewing things up.


William said...

Maybe I'm just not paying enough attention, but I never hear the talking heads say anything like "accuracy within 3 percentage points 90% of the time" -- it's always just "margin of error, 3 percentage points", implying that the error will never be outside that range.

Perhaps "margin of error" is more a term of art than I suspected?

Zeno said...

You're right, William. The talking heads never include the specifics of what plus-or-minus 3 points really means. Even print media, where they have more room for details, avoid talking about confidence intervals and the probability of being within the so-called margin of error. It's a shortcut that misleads people and suggests more precision than is actually the case.

Don't hold your breath expecting them to fix that.