A Radish Saltant: The Confusing World of Disease Mortality Statistics in Simple Numbers

2b/~2b: 'How Many?' is the question!

There is a lot of confusion and debate over mortality figures for novel coronavirus (COVID-19, formerly 2019-nCoV). Most people see the numbers but do not understand how they are derived and therefore may be confused on how to compare numbers from different outbreaks or even the same outbreak on different days or different sources.

As discussed in my previous article, "Bat Soup for the Soul: Teaching with Coronavirus", the simple answer to how deadly this new virus is is that it is a good deal less deadly than SARS-CoV was and a good bit more deadly than the seasonal flu (but affects somewhat different age-groups--- out of scope for this article). At the same time, it is markedly more transmissible than SARS was and somewhat less transmissible than the flu. So, bottom line is that it does less damage on an individual basis than SARS but already has affected many more individuals (and continues to do so). Similarly, it is likely to spread less effectively than the flu but hurt more of the people it does infect (especially the elderly).

[Version 1.1 20200311: corrected typo in equation. Thank you CEMV!]

Less Deadly Is Not Always 'Good'

In general, we often see that less deadly diseases spread faster for the simple reason that people who get quickly and desperately sick do not tend to want to run around and spread disease! When someone has only mild symptoms or takes longer to get sick, they have opportunities to pass the infection to more people. But let us take a quick look at how the mortality figure is derived and why estimates may differ very sharply. We will walk through the math but with deliberately very simple numbers to start:

Let's say you have an outbreak with 20 people infected. At the time we measure, there are 5 fatalities, 5 serious cases, 5 recovered cases, and 5 mild cases. What is the fatality rate?

The quick answer is to divide 5 fatalities by 20 total cases for 25%:

5/20 = 0.25 = 25%

This is more or less the type of number often published for COVID-19. At this moment, using Johns Hopkins' tracker, you get:

4,373 deaths / 121,564 total cases = 0.35981047 or 3.6%

Don't put ANY stock in that specific number because it will be different by the time you read this. If you take this number at different times over the outbreak, the number varies somewhat, and the numbers published by various clinicians or regional authorities vary a great deal because they are taking numbers from their specific populations. Depending on what numbers you use, you can get anywhere from 0.7% to almost 8%, for instance, from different phases of the outbreak in China (according to WHO's report on the Joint Mission to China at the end of February).

OK, so why are people arguing about this? Why are some people saying the number is "wrong" or "likely wrong".

Well, there are a couple of issues with using this number reflexively.

Crude Mortality versus Completed Cases

First, the number is subtly wrong from the way most people think of the probability of dying from a disease. The number above is really what is often referred to as "crude mortality" because it includes uncompleted cases. What does that mean?

In our first set of numbers, we have 10 people, 5 serious cases and 5 mild cases, who have neither recovered nor died (yet). Presumably, they will do one or the other eventually. When looking at past epidemics, like the final numbers for the SARS outbreak in 2004, every case is completed because no one is still walking around actively infected with SARS-CoV-1! So let's fix the number by only including completed cases:

5 deaths / (5 deaths + 5 recovered) = 0.5 = 50% (!)

Ten people total in our example have either died or recovered, so that goes on the bottom. With the other ten people we simply do not know (yet) what will happen. Hopefully that makes sense so far. Mortality calculated from completed cases will tend to be higher for an active outbreak versus a past outbreak, so one must take some care comparing typical actively reported numbers versus historical. But it takes time during an outbreak to get statistically meaningful numbers of recovered cases, so crude mortality is usually what you get.

To take real coronavirus numbers further, we get:

4,373 deaths / (4,373 + 66,239) = 0.061929984 or 6.2%

This is usually what people are really thinking of when they ask "If I am in fact infected, what is my chance of dying once the disease runs its course?" As you can see, it is worse than the crude mortality frequently published. If only two of the serious cases later die and the rest recover, you will see yet a different (lower) number. But wait...

How Many People Actually Get the Disease?

The number you get is clearly heavily influenced by the number of cases of infection you use in the first place. Is this number "correct"? Well, probably not, and how much it is off is a matter of great debate. What happens if you are "infected" but have a mild case (or maybe do not even notice) and never get tested? You won't be included in the numbers at all. Going back to our simple example, if we say that the mild cases are simply never noticed, we get:

5 deaths / 15 cases = 0.333... or 33%

This number is higher than our initial 25%, but we know it does not actually reflect reality. So, let us say that instead of 121,564 cases of COVID-19 world-wide (the confirmed case count from above), we actually have one mild or asymptomatic case for each confirmed case, someone running around who may think they merely have a cold or whatever. Then we get:

4,373 deaths / 121,564*2 total cases =0.1798641 or 1.8%

Well, that looks better, doesn't it? This is the kind of thing you will see in many estimates of COVID-19 mortality, depending on what they use as their guess of how many mild or asymptomatic cases there are. In theory, the unknowns could affect the death count as well (two of the confirmed cases in Washington state were diagnosed postmortem), but we tend to be a bit better at noticing when someone actually keels over as opposed to when they just have a sniffle for a day or two.

Getting Actual Numbers

So, how does one figure out which number is the "correct" number to use for actual cases? How do you account for what you do not know?

Well, people guess from various disease models based on past outbreaks or on detailed numbers from one part of an outbreak. But the tried-and-true method is to swab and test everything that moves throughout a community (at least on a random sample basis) to find out how many people running around have the disease but have not actually showed up at a hospital. China, after a very rough beginning, has started to do this and, as a result, their case-counts, while initially sketchy, are a great deal more reliable. They did actually find unreported cases lurking around the community, mild cases, cases mistaken for something else, people afraid to report, etc., but not that many. South Korea has also done extensive testing around their outbreak (and, interestingly enough, their mortality figures are closer to 0.7%, at the low end of what China found).

The US has done very little of this at all and has suffered from a chronic shortage of test kits. Numbers for our domestic outbreaks (and consequently, estimates of mortality in the US) are therefore extremely poor. Presumably, if we actually had the foggiest clue how many people were infected, our mortality figures would be much lower than they appear. But we just do not know--- and cannot until the test kits catch up, which they are starting to do as of this writing on 11 March.

Be aware, then, if you use global case-counts and deaths, you are getting a mixed bag of both good data and bad data. That results in a number which--- well, it isn't wrong, it is a calculation, and it is what it is, but--- may not be very reliable from predicting the future. Using numbers from countries or regions we know have better data may give better results, but then you have to ask yourself whether the results China gets in their health system or South Korea in theirs will apply equally to the US population and our health system. Roughly, perhaps, but never exactly. HIV spread very differently in European populations than in African populations to what turns out to have been a genetic leftover from bubonic plague: that stuff happens and is inherently unpredictable.

Conclusion

So, what then? What conclusion can we solidly make?

Well, we come back to the beginning: "a good deal less deadly than SARS-CoV was and a good bit more deadly than the seasonal flu". (And, by the way, this virus seems to leave (most) children (<20 years) alone, and that is rather interesting, isn't it?)

Wednesday, March 11, 2020

The Confusing World of Disease Mortality Statistics in Simple Numbers