Magnitude Matters: statistics

Showing posts with label statistics. Show all posts

Wednesday, November 11, 2020

No One I Know Committed Voter Fraud

This is not a post about recounts and pursuit of truth. It is not a post about probability. It is a post about imagination.

I don't know 1 million people, much less 70+ million. I cannot even imagine what 1m people looks like. I've been to football games with 100,000 people. One million is like (checks notes) ten times that.

I can imagine 1 million pieces of paper--dollar bills, pages in books, ballots, etc.

I know some people who voted for Biden, some for Trump, and some of us (bless our hearts) who still believe in freedom who voted for Jorgensen. But remember, I don't know and cannot even imagine 1m people in any form much less 1m people who all wanted to vote for Biden (or Trump, but that isn't important right now).

Okay, so I actually can imagine it, but it is a bit hard if I want to concretely think about 1m people showing up and filling out a ballot for Biden. It is much harder still to imagine them all showing up together at one time and doing so.

But that is what the ballot counting looks like especially after the fact. Boom, X-thousand for Biden, Y-thousand for Trump, etc.

I've seen enough TV to be able to imagine what a fraud looks like. I can imagine easily a vague picture of what a million or so ballot fraud looks like. Truck pulls up to the back of the warehouse, doors open and a sinister fella peeks out, coast is clear, truck gate is lifted revealing fat stacks of freshly-minted fraudulent ballots, dollies unload the loot...

Add to this that perhaps I have motivated reasoning--I would love (hypothetically) to discover that Biden "won" because of fraud. Combine that with my natural and defensible lack of imagination that millions of people see the world differently than I do and in a way that I think is very significant (it was, after all, the most important election of our lifetime).

Do you see how it seems more likely, perhaps much more likely, that fraud is at play in the 2020 election? What is more likely, that something I can barely imagine happened or something that I can easily conceive of happened? I'm just asking questions here.

Unfortunately, "seems more likely" is equivalent to "is more likely" for many, many people. The Monte Hall problem contains an amazing paradox. The probability is dependent on the perspective of the chooser; however, the perspective that matters is not the chooser's imagined framing of the problem. It is the fact that from the perspective of the chooser and the new information he now has, the probability assignment has changed in a way for him that it has not changed for an uninformed observer--for the chooser it is 2/3 vs 1/3 (i.e., 67%/33%); for the uninformed observer it is still 50%/50%.

Probability is in the eye of the beholder. But the beholder doesn't get to invent out of whole cloth the critical elements governing the probability (subjective though they may be).

I lied, this is a post about probability.

Tuesday, August 6, 2019

We're Doomed, I say. DOOOOOMED!

Is humanity doomed? We certainly don’t lack apocalyptic scenarios: nuclear war, a robot uprising, out-of-control climate change. Unlikely, far-fetched? Not according to scientists and mathematicians who, in recent decades, have found a surprising new source for anxiety about the long-term survival of the human race: probability theory. The so-called “doomsday argument” holds that there is a 50% chance that the end of human life will come within 760 years.

That is the opening paragraph from an essay by William Poundstone in the Wall Street Journal. He also was a recent guest on Michael Shermer's Science Salon podcast discussing the wide implications for this elegant theory.

Also from the essay:

Since it is equally likely that those of us living today are in the first or second half of all past and future human births, let’s say that we are in the second half—which would mean that there are no more than 100 billion births yet to come. There is a 50% chance that is true, which at the current global birthrate (about 131 million a year) translates to a 50% chance that we have at most 760 more years of births. A changing birthrate would modify that estimate, but the calculation is that simple.

A friend forwarded the original link to me and we had a bit of discussion on it basically agreeing that the math and process is compelling, but that it seems to be missing something to make it as much as it seems to be. Specifically, I find it very interesting, but it seems to me like a confusion between or muddling of two different concepts.

One (German tanks) is like a kid turning to a football game on TV randomly and guessing about how much longer in real time (not game time) the game will last. The other (humanity) is like being a kid on vacation who wakes up in a car wanting to know "are we halfway there yet." The second case is much harder to answer if we include a key condition that the destination distance is not known by the kid. Even if he knows he is 100 miles from his house in OKC, he doesn’t know if the destination is Branson or New York City or elsewhere. It is much easier to ascertain where he might be in the football game as opposed to the vacation. A score of 14-7 and a flash of the scoreboard showing "3rd Quarter" is much more revealing than a road sign that has a highway number inside a Missouri silhouette. While he can apply the analysis in both cases, his prediction revisions will be orders of magnitude different as time passes for the vacation as compared to the football game.

Another problem I have is that the time frame is inversely proportionate to the future population growth rate. If we slow birthrates down to just above replenishment (about 2.1 births/woman), then we extend the time between now and the next 100 billion people. Thomas Malthus and Paul Ehrlich might agree, but Jean-Baptiste Say and Julian Simon (and I myself) would not. So my complaint boils down to: that when applied to something like humanity and it’s future, this doomsday calculation is not telling us as much as it is purporting to.

Sunday, February 9, 2014

Crime and Punishment, Law and Order, Optimal Rulebreaking

From Advanced NFL Stats:

Last week a WSJ article about the Seahawks' defensive backs claimed that they "obstruct and foul opposing receivers on practically every play." I took a deeper look in to the numbers and found that as long as referees are reluctant to throw flags on the defense in pass coverage (as claimed in the article), holding the receiver is a very efficient defensive strategy despite the risk of being penalized.

That is from a guest post by Gary Montry, a professional applied mathematician. The article is very interesting, but gets a little deep into the statistics beyond the points I want to discuss here. Nevertheless, it is a rewarding read that I encourage including being as Brian Burke puts it, "a great refresher on conditional probabilities and Bayes' theorem".

The article made me think a little about how economic efficiency many times runs counter to our intuition and ideals when it comes to wrongdoing. Novices often get confused by the fact that the economically optimal level of pollution, crime, et al. is not at all zero. It is not that a certain level of pollution is a pure good or that some amount of crime is desirable in an absolute sense--these are still and always "bads" rather than "goods". It is just that at some point the benefit of eliminating the next (aka, marginal unit of) crime or amount of pollution is not worth the cost. At that point we tolerate the "bad". Fortunately, economic progress implies that the cost curve for fighting problems is ever declining.

Tying this back to the article, the question is how could the rules or enforcement be restructured so that this manipulation, which is arguably against the spirit as well as the letter of the law of the game, is corrected or reduced. Howard Wasserman's new paper on Football and the Infield Fly Rule, which is on my to-read list, may offer some help here. The paper is an exploration of how some football situations may imply and incite behavior that is counter to the spirit of the game and sportsmanship. I don't expect him to address this specific issue, but I do expect the analysis to offer some help in situations such as this.

The article also got me thinking about how my neighborhood's HOA is considering instituting fines for uncorrected violations of the neighborhood's covenants. At issue mainly is roof-mounted satellite dishes that are visible from the street--because we all know that things like this "obviously" lower property values by "a lot" (economic research forthcoming I'm sure). Here are some of my concerns assuming we even have the authority as an HOA to do this and assuming (a BIG assumption) the covenants are optimal as written:

Will the punishment (fine) fit the crime? How would we know? If the fine is set so that the behavior is undoubtedly discontinued, we've probably set it too high. If the fine is always paid with no change in behavior, it is not necessarily but could be too low. In fact the optimal fine probably has some of the violations corrected and some continued. But the same people who roll their eyes when economists say we want some level of pollution to continue probably roll their eyes in uproar to think that the neighbor gets to just pay a pittance to continue their property-value-destroying activity. Mrs. Kravitz would be shocked!
Do we set the fine equal for all violations (that is the proposal on the table)? Is parking a trailer or a boat for "long periods" in a driveway equal to satellite dishes being visible and equal to trash cans out of compliance and equal to dead trees not removed or not replaced by the right kind/size of tree etc.? It seems the answer to the second question is most likely "no", which implies the problems of getting the fines right is growing in magnitude.
Do we really want the reputation as the neighborhood who runs around assessing fines on one another? Is that property value maximizing? The list and litany of compliance violations came out a bit during the recent HOA meeting. The implication seemed to fall on deaf ears.
Have we given up on neighborly persuasion? Can't we all just get along?

Rule making and rule enforcing are endeavors fraught with unintended consequences. Just desires and outcomes are almost always highly debatable and are always evolving. Simplier is usually better. Persuasion is generally preferred to force. Tread lightly.

PS. I knew I was in trouble when the HOA asked if the trees I had planted were "free-range" or "farmed".

Sunday, November 25, 2012

The difference between winning and losing

Oklahoma football coach Bob Stoops likes to say, "there's a big difference between winning and losing." Generally this in invoked after a narrow victory especially when the opponent was supposedly over matched. Here is an example from the 2002 season describing the 37-27 victory over Alabama where OU had blown a 20-point lead. And here is an example from the 2004 season describing the 42-35 victory over Texas A&M. But here is an example from this season where he is seemingly contradicting the often repeated mantra. Last night's thrilling Bedlam game gave Stoops another chance to claim that the difference between winning and losing is a vast gulf. So far, I don't believe he has.

I am not aiming to indict Bob Stoops for this common but nonetheless faulty reasoning. Many coaches in many sports have said the same. But I do want to take this opportunity to dispute the idea that a narrow victory is a substantial victory, and I will be using OU as my example. I had planned this blog post before last night's game. What an interesting coincidence that the game was a perfect example for the case I will make.

Missed it by that much

Stoops, et al. have it backwards. There is actually a very little difference between winning and losing in general in life and especially in college football. College football outcomes, like in the NFL, are surprisingly largely driven by random chance. For the NFL, Brian Burke estimates that over 50% of the outcome is random. I've made similar calculations to those of Brian to come up with a 60%-70% share of college football outcomes attributable to random chance. That alone should give us pause. If there is a lot of randomness (a large error term) in football outcomes, how much or little credit can we attribute to everything else?

Let's think about the difficulty in evaluating performances ex post without letting the actual outcome bias the appraisal. Consider a comeback against a lesser opponent that falls short versus one that succeeds. Suppose the game ends up being decided by a last second field goal attempted by the team favored to win. Make the field goal, and the commentators will look back on a splendid series of gritty plays that made the difference. Miss it, and the same commentators will describe how inept was the entire performance. This isn't consistent. All of the performance was the same up until that one single play.

So it turns out that a "brilliant" throw by a quarterback threading the needle between two defenders for a touchdown and the same throw being "ill advised" when intercepted can impact both the outcome of a game greatly as well as how we feel about that outcome. The random factors that govern the success of such a high impact, high sensitivity event are probably the critical factors, and they can cut either way. It seems there really is a fine line between stupid and clever.

OU was close, very close, to winning the BCS National Championship in the 2003 and 2008 seasons. A handful of plays against LSU and Florida, respectively, went a long way to determining those championship games' outcomes. But it would be inconsistent to hold that view about those seasons and games while clinging to the Big Difference theory. If the Big Difference is true, OU was a long way away from holding the crystal ball. Similarly, the Sooners were able to ever so narrowly escape numerous tight situations in their 2000 title run. Oklahoma State was a few fingertips away on a last-second, touchdown pass to ending the dream season. But for a few heroics at Texas A&M two weeks before the OSU game and in the Big XII title game against Kansas State, OU would have been out of the hunt. In the championship game itself Florida State came very close to winning.

Here are the lessons to draw from this:

It is not the actual outcome of specific close events that matter so much as the entire volume of evidence.
As distasteful to some as it is, margin of victory matters. Prediction models for college football among others are significantly enhanced when margin of victory is included rather than just win-loss results.
Whether declaring the strength of the mandate a close election has created for a winning candidate or trumpeting a narrow victory on the gridiron, the logic is flawed. We need to be humble and reasonable in our assessments. That includes working hard to not let the outcome bias the assessment.

Congratulations to both the Sooners and the Cowboys on a great game filled with wonderful excitement. As a fan I can write that more easily because my team prevailed. I know that my joy is not equal to the pain felt by Cowboy fans, and, perverse as it is, my joy is enhanced knowing they suffer and what that suffering feels like--I've been on the other side. I am very happy the Sooners won. I'm not sure if I wish they could have won 51-0 rather than 51-48, but I am sure a 51-0 outcome would mean a lot more.

Update: edited to correct a few grammatical mistakes.

Wednesday, October 3, 2012

Here's a vote for rationality

I planned on writing a post on why I don't vote. In it I planned on laying out the inconsistencies and illogical reasoning behind the many arguments offered by the pro-vote movement. Rather than hack my way through that, I will direct you to the timely and very well written article in the latest issue of REASON by Katherine Mangu-Ward. In it she hits all the critical points.

Here is a particularly good passage:

Voting is widely thought to be one of the most important things a person can do. But the reasons people give for why they vote (and why everyone else should too) are flawed, unconvincing, and sometimes even dangerous. The case for voting relies on factual errors, misunderstandings about the duties of citizenship, and overinflated perceptions of self-worth. There are some good reasons for some people to vote some of the time. But there are a lot more bad reasons to vote, and the bad ones are more popular.

The first thing I like to do when confronted (make no mistake it is always a confrontational attitude) with the question, "Why don't you vote?" is to reverse the questioning, "Why DO you?" This lets me know exactly what approach my adversary is taking: wistful hope shrouded in mathematical ignorance, a desire or duty-bound obligation to feel a part of the process, a genuine understanding of the futility coupled with a defendable enjoyment of voting, et al.

My simple explanation for my position is as follows: It is a matter of principle and pragmatism. First the practical, my vote will not affect the outcome of an election. "But what if everyone thought and acted that way?" comes the familiar refrain. "Then I would vote and determine the outcome of elections. Now let's drop the childish hypotheticals." If you believe that your vote "counts", you are simply and severely mathematically mistaken. I would expect that you are more likely to mis-vote for the opposing side than vote as intended and meaningfully affect the voting outcome. You can enjoy the process and justify your actions on those grounds. In this sense rational voting is like rational gambling: you should vote/gamble because of the pleasure of the experience itself (contribution to democracy/thrill of potential jackpot), not because you believe you will likely change the outcome/win more than lose.

You can also take (the smallest) pride in knowing that you did affect the aggregate numbers for your side by a unit of one. And by proudly broadcasting the "I voted!" signal, you are perhaps unwittingly showing that you went to relatively great expense to support your side. But for what are you truly showing support?

This brings me to the principled reason. Generally there is very little difference between the candidates in an election--emphasis on very. To the extent differences appear, experience shows they disappear or are significantly reversed once rhetoric becomes policy. When voting on specific ballot measures where the distinctions between sides appears more clear cut, we are up against two forces: unintended consequences and the futility of fighting the tides. Voting down a tax increase may open a backdoor for politicians to increase debt. Voting against a measure to grant imminent domain powers to a private company doesn't change the fact that many in the population believe progress requires submitting this liberty. Most importantly, my non vote is a vote of disgust at the ever-reaching growth of the state and the worship of collective action as a problem solver. And I guess it is a little jab at those who don't understand probability.

Wednesday, August 29, 2012

Stats without baselines

I was listening to NPR this morning and near the end of Marketplace Morning Report came a common error seen and heard all too often: statistics that seemingly impart deep meaning but in fact are meaningless because we lack a reference point from which to judge them.

The specific occurrence in this case was a brief comment on Allstate Insurance's latest release of its annual report, "Allstate America's Best Drivers Report". The two snippets were that Philadelphia drivers were 64% more likely to be in a collision than the average driver nationally and that the safest drivers are in Sioux Falls, South Dakota where drivers on average go 14 years between collisions. The actual official report is here.

Besides the problems with how each statement is worded as compared to what those statistics are actually saying (I may have the quotes wrong as well), there is the bigger concern, for me at least, that we really don't know if those numbers are in any way significant. To know that we'd need to know more about the full data set including the actual averages and dispersion. We'd also probably like to know how volatile these statistics are year after year and how they were computed. But we don't know any of that from the report; so all we are left with are the impressions they make at first glance, which are highly subject to personal bias and incorrect interpretation. We'd also like to know if the statistics are adjusted for factors such as miles driven and conditions like weather (they are not).

But thinking about both numbers together we can cobble together a little logic to help know that they are probably not significant numbers. It would be highly unlikely that drivers in Sioux Falls or any other city are that different any other American city. At 14 years between collisions, collisions seem pretty rare. Rare events can be easily distorted by slight adjustments in contributing factors including random factors. A 64% increase over average is therefore probably not statistically significant. For those in the City of Brotherly Love, I say, drive on.