Obscure statistical concepts are driving the news these days.
NRPLUS MEMBER ARTICLE
he coronavirus pandemic has thrust data-wranglers into a national spotlight they normally enjoy only in the runup to a presidential election. The death projections put forth by the University of Washington’s COVID-19 model have become the New York Times election needle of early to mid 2020: We all hang on their every fluctuation. Other recent developments have treated the public to discussions about statistical concepts ranging from “sensitivity and specificity” to “selection bias” to “quality-adjusted life years.”
For those who haven’t been following this obsessively — what, you have better things to do? — here’s a quick guide to the key developments.
1. Every prediction has an asterisk, and it’s your job to read the fine print.
Numerous disease experts have tried to “model” where COVID-19 is headed, and their predictions have created no end of confusion, anger, and outrage. These tools are worth consulting, because they provide the educated guesses of people who have dedicated their lives to studying epidemics. But each prediction is built on a set of assumptions and methods, and you cannot understand a prediction unless you also understand the machinery behind it. Sometimes the media and the public fail to do their due diligence, and sometimes the machinery malfunctions.
Remember the prediction of 2.2 million deaths from Imperial College? That was a prediction based on a “do nothing” scenario, a baseline where life simply went on as before. The same report offered lower projections in scenarios where we took steps to control the virus. The 2.2 million forecast will not be proven wrong when we end up with far fewer deaths, because we didn’t, in fact, do nothing — though the model’s assumptions about how the virus spreads and how deadly it is may not age well. Those who hyped the prediction, and those who continue to harp on it to this day, simply misunderstood what it represents.
There’s also, of course, the much-discussed and much-maligned Washington model. Where the Imperial model shows what happens when everyone misunderstands what a model is saying, the Washington model illustrates how even experts can go wrong: It simply hasn’t predicted deaths all that well, and rival experts have highlighted flaws in its inner workings. What they’re doing is very difficult, and I wouldn’t blame them for changing the model when new information comes in, but if it doesn’t work, it doesn’t work.
2. Division is tricky when you don’t know the numerator or the denominator.
To determine the “infection fatality rate” of a virus, you divide the number of people it kills by the number of people it infects. That’s pretty easy — provided you have those numbers in hand. We don’t.
On the infections front, it turns out that many people get the virus without experiencing severe enough symptoms to seek medical attention. So, while we have an official count of confirmed cases, this is definitely an undercount. “Serology” studies, more about which in a bit, suggest there are many times as many infections as there are confirmed cases, but it’s impossible to say exactly how bad the problem is.
Counting deaths is no easier. As skeptics have been saying for weeks, not everyone who dies with COVID necessarily died of it. But on the other hand, not everyone who dies of COVID is being counted that way, especially people who die in their homes. We might eventually get a better idea of the death toll by looking at “excess deaths” — the degree to which overall mortality rose, whether or not the deaths were recognized to be from the disease — though there are difficulties with this too. The lockdowns will probably reduce mortality from car accidents, for example, while increasing deaths from suicide and postponed medical care. These phenomena will need to be separated from actual COVID deaths.
Nonetheless, one thing we do know to a reasonable degree of certainty is that this is far worse than the flu. In just a few months, just based on the official tallies, it’s killed about one in every 500 residents of New York City and about 60,000 Americans. And the early data on excess mortality suggest we’re undercounting, not overcounting, COVID deaths.
3. Straightforward numbers might not mean what you think.
Thanks in large part to the recent wave of serology tests — which measure whether someone has had COVID-19 in the past — there have been lots of very different numbers flying around regarding how many people might be immune to the virus and how badly we’re undercounting cases. Some of the variation here is simply because the tests were run in different places and some places have been hit harder than others. But some of it results from other problems with these studies.
First, most of these studies are not based on random samples. The tests were given to people who happened to be walking around in public, or who answered a Facebook ad, or who visited an urgent-care clinic. There is no reason to expect that such people are representative of the overall population in regards to their exposure to COVID-19.
And second, the tests themselves are not perfect — they sometimes give false positives and false negatives. (This is the “sensitivity and specificity” issue; don’t bother learning which is which because you’ll never remember.) Sometimes false positives and false negatives can roughly cancel each other out. But in areas where few people have had the virus, one can encounter a situation where, say, 2 percent of people test positive, but the test has a false-positive rate in the vicinity of 2 percent.
Bottom line, serology tests are promising, but the existing studies don’t allow us to nail much down definitively.
4. It’s hard to know if A is higher than B if you don’t know A or B.
The mother of all policy questions through all this has been whether it’s worth it to lock down society. The field of “cost-benefit analysis” gives us the tools we need to frame this trade-off. But it doesn’t fully answer the question.
To know the costs of locking down, we need to know the full economic damage we’re inflicting — but we can’t know that, since that damage could be long-lasting. We also need to know exactly how much lockdowns slow the virus’s spread, but that’s a matter of intense dispute, too. And even if we agreed on how much economic damage we were trading for how many lives, we’d then have to agree on a metric for valuing those lives. As I’ve pointed out elsewhere, the available metrics diverge wildly, especially when it comes to prolonging the lives of people with few years left.
* * *
The upshot of all this is not that statistics are useless and we don’t know anything; to the contrary, statistics are among our best tools for fighting this thing, and with every passing day we’re getting better data on how the virus behaves and what works to stop it.
Instead, what I mean to convey is that science is a very messy project and it’s best not to get too invested in any single number. The facts change all the time, forcing us to change what we believe.
Read the Original Article Here