It is time to speak of many things, of cabbages and kings, of why Bat Soup is boiling hot and whether it has wings...
There is a great deal of media discussion about the 2019-nCoV, 2019 Novel Coronavirus, outbreak in Wuhan China. Some are predicting dire catastrophy, others are saying it is just a distraction from impeachment. The problem is that most people do not understand viruses or epidemiology enough to judge what is being written, to understand whether this or that recent news is important. I am, myself "concerned" about the outbreak, very concerned about the catastrophe for the victims in China and "somewhat concerned" about what may happen here. I also see this as a "teaching moment" to try to explain some of the concepts behind the progress of and efforts against the disease.
- Draft 1.1.1 11 March 2020 - Added link to Flatten the Curve chart (#FlattenTheCurve) and some discussion at end of article now that we have community spread in the US.
- Draft 1.1 2 February 2020: Added, briefly discuss, a Lancet paper presenting a more involved (SEIR) model. Editorial corrections. Organized References. 1.1.01 same day: typo correction.
- Draft 1.01 29 January 2020: Corrected significant typo in discussion of Basic Reproduction Number. Thanks CEMV;
- Draft 1.0: 28 January 2020. Initial complete text. Needs a proof-reading pass or two, apologies.
If you are in a real hurry and do not have time to learn the underlying how and why, this same basic thing is presented in one chart as
Flatten the Curve. I discuss this idea a bit more at the bottom and why we have suddenly gone from trying to "stop" the virus to spreading it out in time. (Thank you, Christie!)
Personally, I went from college (Environmental Science) to Air Force Studies and Analyses. My thesis was the production of a computer simulation toolkit for environmental and biological systems in C++. When I was learning these things, the computer resources for exploration were either not available for students or extremely expensive, and I added to the pool of such tools available. At the Pentagon, I mainly supported intelligence analyses using computers: improving, maintaining, and writing tools to analyze intelligence data, including Nuclear, Biological, and Chemical (NBC) warfare models. Since I did not have formal training in epidemiology, I had to learn much of it the hard way, talking to people who did and entombing myself in the Pentagon library for days-at-a-time until I understood what I had to make the simulation simulate, making mistakes, and doing it again until the mistakes went away. That experience does not make me a virologist or an epidemiologist now, but it means I have enough background to understand the papers being published and the data about the course of the disease.
[
If you are dumb (or determined?) enough to try to learn the same why I did, some useful starting points are given at the bottom...]
I am going to try to explain some basic principles here about how some of the data coming out of China might affect the United States if the virus spread across the Pond and achieved effective human-to-human transmission here. What I am going to show you is
not a predictive model but a teaching tool to understand how such a disease might progress in a large population with
no effective medical prevention. Clearly, medical intervention will be attempted and some of it undoubtedly will be successful. The use of this model is to show
what those medical efforts need to prevent and some of the issues involved.
If you are math challenged, don't worry about the equations as much. The graphs should give you a feel for what is happening. If you like math, the equations included will give you a means to play with the numbers yourself.
(Brief) Background On the Virus
The 2019-nCoV is a coronavirus which has been discovered in Wuhan, China related to two previous disease outbreaks, SARS-CoV (Severe Acute Respiratory Syndrome) and MERS-CoV (Middle East Respiratory Syndrome). The coronavirus family normally produces disease in bats, not humans. 2019 Novel Coronavirus is just a placeholder title for a specific coronavirus which in some way has learned how to infect humans. The scientific community has not come up with a handier title yet, so for ease of discussion and in honor of the popular (but likely incorrect) idea that it came from eating bat soup, I am going to refer to it as the Bat Soup Surprise Virus, "Bat Soup" or BSSV for short.
As of this writing, Bat Soup has infected roughly 4,000 people, almost all in China of which almost 100 have died. There have been 5 confirmed cases in the US, but all of these are imported cases, people who were infected overseas before coming to (or returning to) the US. I am not even going to try to print and cite up-to-date numbers here because they are changing too rapidly.
Animal viruses do cross over to humans from time to time. In many cases, they fail to effectively replicate in humans and therefore simply fizzle out. This virus is concerning because it has demonstrated sustained human-to-human transmission over more than five generations of confirmed cases and does not show signs of weakening. Attempts are ongoing to contain it to China, to locate, isolate, and treat the leakers who have brought the disease to other countries. In China, a large scale quarantine has affected more than 55 million people, including 11 million in the greater Wuhan area and 33 million in a neighboring city. The CDC is working to track contacts of infected people who came to the US and to process test samples to determine who among them may have the virus. This kind of effort is precisely what stopped the spread of SARS in 2003-2004.
Compared to SARS or MERS, this virus is more contagious but considerably less lethal, making it more likely to escape containment and spread but likely to cause fewer fatalities if it does. SARS had a case-fatality rate of about 10%, MERS about 37%; the Spanish Influenza of 1918 somewhat less than 5%; this disease is variously calculated at 4% or 3% and (for a variety of reasons) the actual number is likely to be lower as (
if) it spreads.
The Basic Reproductive Number
A critical number for understanding disease epidemics of any type is the Basic Reproductive Number or R0 (often pronounced "R-nought"). This is often talked about but seldom actually explained. The Reproductive Number is the average number of successful transmissions of the disease from one individual. If one person manages to infect two other people (before recovering or dying) and each of those new infected people manage to each infect two other people (and so on), then the Reproductive Number (R) is 2.0, as shown in the following illustration:
Note that R is really an average. Bob might infect 4 people and Susan only 1 (avg = 2.5). It depends both on how contagious the disease is and on how many people Bob and Susan regularly come into contact with! For the same reason, R will almost certainly change over the course of an outbreak, as it encounters different conditions and as the medical community tries to stop its progress. The
Effective Reproductive Number at time t or R(t) describes this change over the course of an epidemic. The Basic Reproductive Number, R(0), is then the "ideal" R at the start of the disease in a virgin population and overall (roughly) describes the capacity of the disease to move from human-to-human in a population. Strictly speaking, this number is different for Bat Soup in China versus Bat Soup in the US. The population density and social habit in Wuhan is just a little bit different from, say, rural Southwest Missouri or even Brooklyn. In common usage, R(0) is used to compare different diseases across populations. Just keep in mind that this common usage is
not entirely accurate.
Notice what happens when R changes in the illustration. There are three "interesting" ranges for R in describing diseases:
- R is less than 1.0: On average, each infected person infects less than 1 other person in each generation of the disease. Over time, this disease will fail to spread and die out. The Middle East Respiratory Syndrome (MERS-CoV) had an R0 of slightly less than 0.7 and did not effectively spread.
- R is exactly 1.0 (shown): Each infected person, on average, infects 1 new person. The disease remains in the population, going neither up nor down.
- R is more than 1.0 (shown): the number of infected people will tend to increase in the population from generation to generation of the disease. Growth is exponential, slow if R is near 1.0 and increases rapidly as R increases. Many infectious diseases range from 1.0 to 3.0. Some extremely infectious airborne diseases (e.g. measles) can be 15, 20, or even more.
Handily, this tells us the goal of epidemiology in an outbreak:
convince the Effective Reproductive Number to be less than 1.0. Public health efforts do not have to actually stop the disease or prevent every case. If R(t) is less than one, the disease will die out on its own, even if infection continues for a time. There is a "good enough" point which gets the job done and protects the public. This is how SARS was stopped.
Time in Disease Models: Incubation, Latency, and Generation Time
To understand disease spread, you have to not only understand how many people it can infect, but how long it takes to do it. This section explains some basic terms for time with respect to infections.
When one or more
pathogens (the infective agent, whether virus, bacteria, fungus, etc.) enters a human host, they cannot spread or cause disease immediately. The pathogen has to multiply in the body first, bypassing or overpowering the immune system, and reach some critical mass. Someone sneezes on you and eventually you start sneezing on others. The average time it takes between initial exposure and the development of symptoms is called
incubation time. The time between initial exposure and when the host becomes contagious is known as
latency.
Often, we assume that these numbers are the same, that is, that the disease can be spread starting when symptoms appear. This makes sense, because symptoms like coughing, sneezing, diarrhea, etc, are in fact the very tools the pathogen uses to infect people. They may not be precisely the same, however, (and may or may not be the case with Bat Soup) but that discussion is outside the scope of this article. Just keep in mind that they
may be different things and plough forward for now, intrepid reader.
This concept of latency is what provides the time clock in a disease model. The latency period, the time it takes for a host to be exposed, for the infectious agent to multiply in their body, and for them to infect others is the
Generation Time. The generation time will tend to be a bit larger than the latency period because the disease cannot successfully spread until it becomes infectious, it comes in contact with a susceptible host and
the transmission to the new host succeeds. Combined with R, we can figure out how quickly a disease will spread from generation to generation of the infectious agent (a virus in this case). We will make use of this number in a little bit.
The World Health Organization (WHO) has listed 4 days as the average incubation time for BSSV in a range from 1 to 13 days. That means that if someone is exposed and has not developed the disease in 14 days, it is not considered likely that they will. This then becomes a handy number for isolating suspected cases. The generation time used by one model (see References) is either 8.3 or 6.8 days, meaning that, on average, it is thought to spread most easily a bit after symptoms first appear. The first number is the generation time measured for SARS-CoV and so it simply assumes that Bat Soup works the same way (it may not). The second number assumes that the generation time for this virus is a bit shorter. Whether or not these numbers are correct is again, outside our current scope, but they give us good numbers to work with for our model below.
Susceptibles and Immunity
Now that we know how many people a pathogen might infect and how quickly it can do it, we need to look at
who it can infect. That subject can be complex, particularly when it has to take into account prior immunity and vaccination rates, but (fortunately or unfortunately) it is much simpler with respect to Bat Soup and a population which has never been exposed to it before.
The number of susceptibles, S
, is initially the number of people in the population.
But what about after the disease starts to spread? In each generation of the virus, people get infected and those people either recover or do not (die). If they recover, they develop immunity (presumably) to future infections, so, either way, anyone who is infected is removed from the pool of future susceptibles. We have to track this number in our model. S(t) is the number of susceptibles at generation t.
The Reed-Frost Epidemic Model
And now we have enough pieces to get to our simple epidemic model, the Reed-Frost model of an epidemic. Wade Hampton Frost was a late 19th, early 20th century epidemiologist. Lowell Reed and Frost developed this model in 1928. The Reed-Frost model is a simple iterative or step-based model, easy to calculate on paper or with a spreadsheet. It is deterministic (not random or not "stochastic"). It has a great many limitations, but is often used as a teaching model because it is easy to do, easy to play with the numbers and get instant results.
(Reed-Frost is sometimes referred to as an SIR model (Susceptible-Infectious-Removed) and is one of the simplest in a family of models known as Compartmental Models. We'll touch on this a little more in a bit.)
For many reasons, Reed-Frost is not likely to be accurate, and we'll get into some of those reasons after we explore the model itself. It will, however, visually demonstrate the pieces we have explained above given real numbers from the current outbreak and then, hopefully, give the reader some insight into the practical effect of developments in the news. This, in turn, may make people either less or more afraid, depending on whether they currently fear too little or too much... In either case, the fear will hopefully be more rational and appropriate.
[Trigger warning: equations follow - if you are arithmophobic, just close your eyes, think of England, and go on with the text (after opening your eyes again).]
The Reed-Frost model uses the following formula:
C(t+1) = S(t) * (1 - (1 - p)^C(t)) [Note to self: replace with LaTeX equation for better display]
Where:
- C(t+1) will be the number of cases for the next generation of the model.
- S(t) is the number of susceptibles for at time (generation) t. (You will need to multiply by the number of days in a generation to get a time in days.)
- p is the probability that any given infected person will successfully infect someone else within one generation. This probability is fixed and does not change over the course of the epidemic in the Reed-Frost model!
- C(t) is the number of (active, not total!) cases in the current generation.
The idea is that you start with the initial number of infections (say, a single individual who gets off an aircraft from another country), and an initial number of susceptibles (the whole population in our case) and use that to calculate the next generation, C(t+1). You then subtract that count from the susceptibles and do it again. And again. And again. At each generation, the number of cases increases as the number of susceptibles decreases. Eventually, the chance of an infected person successfully contacting a susceptible starts dropping sharply and the number of new infections falls off. This creates a characteristic curve we shall see below.
The Reed-Frost model makes a number of assumptions, including the fact that p is assumed to not change over the course of the epidemic (it does not allow for successful intervention or even changes in population density and habits within the population, say rural Alabama vs. urban California). It assumes that contact is random and the population is thoroughly mixed. Sometimes these assumptions make it pessimistic, other times optimistic, still other times just a bit off. If we keep these things in mind, it is a useful tool.
As with our discussion of R, if S(t) * p is above 1, the epidemic continues to grow. In contrast, if it is below 1, the epidemic will tend to shrink. S(t) * p models the
Effective Reproduction Number or R(t) for a given generation. Given a population of 100 people, an R(t) of 2 gives a p of 2%, an R(t) of 1 gives a p of 1% and so forth, but this input number must get smaller with larger populations.
Given those notes, we show a graph of the Reed-Frost model for Bat Soup given an initial population of 331 million, an R(0) of 2.1, a case-fatality of 3%, and a generation time of 6.8:
The number of cases builds slowly, the number of susceptibles falls, and they cross here on day 176.8 (generation 27). The peak number of cases is a little over 56 million with a final death death toll of 8.2 million. We can see from this, that even with a disease with a relatively low lethality but good ability to spread, the losses can be considerable. The number of people who are simultaneously ill can itself be "problematic" even if most of them recover. We also, see, however, that the build to peak happens over almost half a year even in this dire case.
Now we look at a different case, one where the R(0) is 1.5 (the minimum WHO estimate), the case-fatality is 0nly 1% (but still 10 times common influenza), and the generation time is 8.3 days.
In both cases, our spreadsheet takes the epidemic out 50 generations. In this second one, we see that the peak happens at well over a year (390 days, generation 48). At peak, there are just shy of 16 million simultaneous cases and a death toll (by generation 50) of 1.75 million. This kind of scenario would take into account that our health system and prevention measures would both slow the spread and produce fewer fatalities than in China.
Lastly, we produce a graph with an R(0) of 1.5, case-fatality of 2%, generation time of 6.8 days.
Here we peak at 20.7 million active cases in generation 46 (306 days) with a cost of 3.5 million lives by generation 50. We can vary the graph in a number of ways, but you can see that the curves have the same general shape.
What the Model Shows Us
From these different graphs, we can get a feel for some principles of epidemiology in a case like this. Specifically, we see that, no matter what R(0) is, the virus will eventually touch almost the entire population if it is not actively stopped: it is just a question of how long it takes. That also means that for a given case-fatality rate, the final death toll doesn't really change, it is just spread over a shorter or longer period of time.
We also see that being able to adjust the rate of spread dramatically changes the peak number of infections and the amount of time we have to come up with interventions. Having, say, 50 million people all sick in bed at one time would clearly bring many functions of society to a halt, even with a moderate cost in lives. This means, in turn, that contact tracing, appropriate travel restrictions, self-quarantine, closing schools or public events where necessary, etc., can make a phenomenal difference in the overall cost of the epidemic in both economic and human terms. At best, it can bring the R(t) to below 1.0 and actually halt the spread. According to the report I got these input numbers from, the spread of this disease must be slowed by at least 60% to halt the epidemic [Imai, et al, "Report 3", see References below]. [Update 11 March 2020: as of the time the WHO Joint Mission to China returned and published (24 Feb), this has actually been achieved in China. New cases are still occurring, but the outbreak there has stopped growing. Now China is sending a mission to help Italy.]
If spread is never halted but simply works its way through the susceptibles in the population one generation at a time, a new disease may become "endemic", it reaches an equilibrium state where immigration and births provide new hosts to balance those lost to immunity or death. Human-kind deals with a number of such endemic diseases.
[Update 11 March 2020] The
Flatten the Curve chart shows this same concept in very simple form. At the point where we now have community spread in the US and over 100 countries with cases globally, our chances of "stopping" the virus are close to zero. But if we can
slow spread, it makes the difference between an outbreak that the US health system can keep up with and one, like Italy, where the system is overwhelmed and people die who might otherwise be saved.]
What the Model Does Not Show But Might Be Important
As mentioned above, the Reed-Frost Model has a number of shortcomings. Better models have been produced in general and specific models are being produced in the literature for this particular virus. All of them are going to be "a bit more complicated" than what we have here. Let's briefly discuss some of the important aspects of real-world virus behavior against our crude model.
Fixed p and Nosocomial Infections
As already mentioned, p is fixed in this model. We would expect that public health efforts from the national to community to individual level would lessen the spread over time. One of the critical ways this is so is with so-called nosocomial infections. This is a strange word you may encounter in the news but it is really very simple: a nosocomial infection is one which occurs in a healthcare setting, whether from the first responder (maybe a paramedic or LPO who first discovers a victim) to the hospital ICU and everywhere in-between. Paradoxically, the healthcare apparatus can be the greatest risk in combating infection. In past epidemics, healthcare workers, including first responders, paramedics, LPOs (who may be the first responder before paramedics are called), nurses, doctors, etc., can be exposed to infectious disease at rates 10 or 100 times as much as the rest of the population. When these health care workers start to get infected and sick in numbers, it strips the population of the very people who are depended upon to protect everyone else.
This is one of the reasons that infectious disease precautions and procedures are drilled so hard into everyone in the healthcare system, even volunteers like myself who are on the very edges of the system. It is why we drill things like "gloves and masks" and proper hand-washing very hard in training (and will certainly be doing so in the coming year!) Controlling nosocomial infections has the potential to dominate the course of a disease and did so with SARS. It is also important that trained volunteers exist in the community in advance to step in as attrition reduces the number of professional responders available for routine tasks. Everyday emergencies do not simply stop during an epidemic.
Self-Protection For Communities and Families
Some of the same basic techniques, including disciplined disinfection and handwashing, also reduce R(t). Every table (or, these days, touchscreen ordering device) at a restaurant which gets disinfected, every doorknob cleaned, can stop several infections. But the best approach for the general populace may simply be to temporarily reduce contacts with others (self-quarantine) to deny the virus opportunities for transmission. A bit of preparedness, such as a well-stocked pantry, materials for temporary home-schooling, or the ability to telecommute to work go a long way toward making self-quarantine possible and effective.
New Interventions?
We may end up with new interventions during the course of an epidemic, such as experimental vaccines (usually prioritized for healthcare workers for the reasons given above), better antivirals, etc. All of these can change the curve we see.
To Everything There Is a Season...
The other thing this model does not show is normal seasonal variation. With 170 days or more to peak in these graphs, the yearly changes in weather and activity will affect the course of infection. Infectious droplets from coughing or sneezing are not as effective at spreading disease in the summer when people spend more time outside, the windows are open, and schools are closed. If the start of sustained spread doesn't happen until warmer weather, the progress should be considerably slower. It would, however, pick up again as schools reopen and the weather turns cold (typical flu season). Past flu pandemics, such as the 1918 Spanish Flu progressed in waves, and this is one of several likely reasons.
Population Differences and Super-Spreaders
Some locations tend to spread illness no matter what interventions are taken or how low R(t) can be gotten outside of them. Major measles outbreaks frequently start at places like Disney World or university mega-campuses. When an infected person can come into contact with hundreds of people on a typical day, even a very low p can result in infections. Similarly, certain people (say, teachers, salespeople, paramedics, bat soup connoisseurs...) tend to be exposed to and potentially spread disease much more than the rest of the population. These sub-populations can continue to be sources of infection before an epidemic really builds and long after it wanes. The Reed-Frost model is simply not sophisticated enough to show such super-spreaders.
SEIR: A Slightly More Complex Model
To see how this kind of thinking applies to a real exploration of Bat Soup. you might try looking at
Wu, Leung, and Leung, "Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study" (full citation in References) to see how much you can follow along, what important concepts you can pick out.
- What numbers do they use for R(0), for generation time, etc.?
- What do they assume about incubation and latency?
- How do they try to control for the success of interventions in limiting the spread of disease?
- What kind of timeline do the authors suggest for international spread?
There are a couple of pieces of their argument that I am not sure I
fully understand or fully agree with, and I would not expect someone
working from just my explanations here to do more than skim. The
challenge, if you accept it, would be to see whether you could
understand enough to judge how the model they present could be important
and what it says about potential spread.
That paper presents an
SEIR model (Susceptible-Exposed-Infected-Removed; Trigger Warning: scary equations in link) to try to take into account air travel data from China to better understand the real scope of the infection inside Wuhan (including the likely very high number of cases even the Chinese authorities do not know about) and then predict spread forward in the regions outside of Wuhan in China and internationally.
SEIR is another in the family of Compartmental Models and it is
usually presented as a system of Ordinary Differential Equations (ODE),
requiring knowledge of calculus, which I rather wanted to avoid in the body of this article. The relationships and results are shown in decent graphs (except for the European number formatting that always takes me a bit to adjust to). At the very least, this should give you a taste for what a typical real-world publication may look like (and why most people don't read them?).
Conclusion, References, Further Reading
So, now that we have almost gotten to the end, hopefully you have a bit better grasp of how infectious disease spreads, perhaps enough to better understand why one outbreak may be more worrying than another, and why some developments in the news may be something that needs to be paid attention to while others can be passed over. An understanding of terms and principles can help you decide whether you need to worry and how much worry is appropriate.
Personally, however, I figure that some basic preparations and precautions are almost always justified, simply because if Bat Soup does not take wing, something else someday most definitely
will. Concentrate on those preparations which will not hurt you either way and which you will eventually use regardless (say, some long-term food storage or a bottle or two of disinfectant, the means to work from home when you need to, good nutrition including vitamin C and D, some first aid training, etc.).
References and Links
- If you want my spreadsheet for educational use, ask. I am working on adding some notes and making it a little more user friendly.
Reed-Frost and Compartmental Models
- The Reed-Frost Model has a basic entry in Wikipedia, a better but still approachable description is in "Epidemiology - An Introduction" by Kenneth J. Rothman. 2nd Edition. Oxford University Press. Oxford. 2012 pp 118-119. Kindle ed. Available.
- The Basic Reproduction Number (R(0)) and the other concepts above can also be explored on Wikipedia and are defined in Rothman 2012. Both sources also have tables of estimated R(0) for a number of diseases. I have just seen (28 January) that the Wikipedia article now includes some referenced R(0) estimates for Bat Soup, which, obviously, Rothman 2012 does not.
- Compartmental models in general have a Wikipedia entry, including exploration of SIR and SEIR models (Calculus again!). There is also a long article/report/short book by Fred Braur freely available (PDF): [Brauer, F. (2008). Compartmental models for epidemics. Vancouver, B.C.: Research Gate. Retrieved from https://www.researchgate.net/publication/228594171_Compartmental_models_for_epidemics]
- The Bat Soup SEIR model I discuss above: [Wu, J. T., Leung, K., & Leung, G. M. (2020). Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan , China : a modelling study. Lancet, 6736(20). https://doi.org/10.1016/S0140-6736(20)30260-9]
Numbers Used For My Bat Soup Model
- For the R(0) and generation time numbers used above: [Imai, N., Cori, A., Dorigatti, I., Baguelin, M., Donnelly, C. A., Riley, S., & Neil, M. (2019). Report 3 : Transmissibility of 2019-nCoV. London. Retrieved from https://www.biorxiv.org/content/10.1101/2020.01.23.916395v1, Accessed 27 January 2020]
Exploring Epidemiology
- For more in-depth exploration of epidemiology, I strongly recommend ["Principles of Epidemiology, a Self-Teaching Guide" by Roht, Selwin, et al. Academic Press, NY., 1982.] It is one of the books I started with "in the day" and is still useful. I have recently discovered that it is available as an e-book. It provides a clear path to work through concepts, terms, and exercises, looking up the topics in other books as necessary. That is why it does not go out of date: if you use more current books and articles to look up the information to do the exercises, you will keep current with new developments. This could be a very useful approach for, say, a homeschool unit for an adventurous older student. Most of it is approachable with a strong grasp of algebra and basic statistics. The tools needed to rough out models, like spreadsheets with good built-in functions or even programming environments like Lua, are all freely available these days.
- Epidemiology texts, resources, and papers can sometimes be awfully expensive. Being retired, I tend to look for books at library sales where extremely expensive reference books can go for a few dollars. I also periodically visit the St. John's Cancer Center Community Health Library in Springfield, MO, which has a fantastic array of resources, including references and journals. Getting a library card from them is not expensive (though I forget the exact amount currently). If you are not local, you may have similar resources in your community, such as a college or university library open to the public. Some institutions, including my college, also offer alumni J-Store accounts with a selection of journals; it may be worth checking to see if you have access to such a program.