Wednesday, August 29, 2012

Stats without baselines

I was listening to NPR this morning and near the end of Marketplace Morning Report came a common error seen and heard all too often: statistics that seemingly impart deep meaning but in fact are meaningless because we lack a reference point from which to judge them.

The specific occurrence in this case was a brief comment on Allstate Insurance's latest release of its annual report, "Allstate America's Best Drivers Report". The two snippets were that Philadelphia drivers were 64% more likely to be in a collision than the average driver nationally and that the safest drivers are in Sioux Falls, South Dakota where drivers on average go 14 years between collisions. The actual official report is here.

Besides the problems with how each statement is worded as compared to what those statistics are actually saying (I may have the quotes wrong as well), there is the bigger concern, for me at least, that we really don't know if those numbers are in any way significant. To know that we'd need to know more about the full data set including the actual averages and dispersion. We'd also probably like to know how volatile these statistics are year after year and how they were computed. But we don't know any of that from the report; so all we are left with are the impressions they make at first glance, which are highly subject to personal bias and incorrect interpretation. We'd also like to know if the statistics are adjusted for factors such as miles driven and conditions like weather (they are not).

But thinking about both numbers together we can cobble together a little logic to help know that they are probably not significant numbers. It would be highly unlikely that drivers in Sioux Falls or any other city are that different any other American city. At 14 years between collisions, collisions seem pretty rare. Rare events can be easily distorted by slight adjustments in contributing factors including random factors. A 64% increase over average is therefore probably not statistically significant. For those in the City of Brotherly Love, I say, drive on.