The number of confirmed COVID19 cases in the US surpassed 1.1 million on May 2. However, the number of actual cases is probably significantly larger than that: the very same day, we learned that 12.3% of New York state tested positive for coronavirus antibodies. That's 2.4 million people, in one state. The truth is that no one really knows how many have been infected.
How is such a disparity possible? What are its implications?
The answer to these questions starts with understanding the role of testing policies.
Let's consider a simple example. The graph oppositve represents the population of a fictional city. It has about 1000 inhabitants, among which 10 are initially infected by a mysterious virus.
Some disease carriers are asymptomatic, but not very contagious. Others present symptoms that can be deadly, and are highly contagious. Most of them will recover and become immune.
As time goes by, the disease spreads, until there is no active case left.
Reorganazing these dots yields a clearer picture of what the city has been through: the majority of the population got the disease.
We can then visualize the spread of the disease as a function of time: the virus' propagation slows as the number of uninfected decreases.
As we are working with a simulation, we know exactly what is happening at every point in time. In particular, we know how many people where infected. Let's focus on this number: this new curve represent the total number of people which have been infected as a function of time.
This number encompasses the current infected population  both symptomatic and asymptomatic  as well as recovered cases and casualties.
Let's now consider a first testing scenario.
In this scenario, we have access to a limited number of tests per day. This corresponds to the situation unfolding in many countries today, including the US.
We suppose that most symptomatic carriers are seeking testing, as long as they have not already been confirmed ill. A smaller fraction of asymptomatic carriers also try to get tested.
However, access to tests is limited: the city has a limited capacity of 5 tests per day. It may not sound like much, but it's actually not that bad considering our population size.
Out of the actual cases reached by the end of the pandemic, only are detected.
Now, let's consider a second testing scenario.
We are making the same assumptions on who want to get tested, but in this scenario, there are no limits on the number of tests: every individual who wants a test gets tested.
Furthermore, a small, random sample of the population is also tested for antibodies every day: if they have been infected at any point, they will be counted.
This testing policy is significantly more effective: out of actual cases, are detected.
You might think that such gaps in testing did not arose for COVID19. But they did. And it is an ongoing issue.
Those numbers are crucial, because our ability to adapt public policy and limit the spread of the virus is directly dependent upon the quality of our data and models.
Epidemiologists and health officials are painfully aware of these issues, and try to adapt to the situation despite of them as best as possible.
However, it is difficult to adapt when we do not even know how deadly the virus is.
To illustrate the difference that testing policies can have on crucial data points, let's focus on the fatality rate.
In our simulation, the actual fatality rate was .
But using numbers from the first policy, we would estimate , while getting with the second policy.
All testing policies underestimate the actual number of cases, but how underestimated vary greatly depending on specifics. And that is without going into additional factors that also play out in reality, such as false positive and false negatives in tests' results.
All of this results in a situation where we do not know precisely what is happening, and we cannot rely on what is happening elsewhere to base our decisions and public health policy.
These decisions are impacting our lives and understanding their limitations is crucial.
This article revolves around a simulated disease. We use a stochastic variant of the SEIR model, which is a classical compartmental model in epidemiology. You can experiment with our model below.
Initial population 

Total population  
Number of initial cases  
Testing policy 

Maximum number of viral tests per day  
Proportion of symptomatic carriers seeking a test per day  
Proportion of asymptomatic carriers seeking a test per day  
Number of random antibody tests per day  
Note: individual that have already been tested positive will never be tested again. Casualty are also automatically identified. 

Disease parameters 

Daily probability of an asymptomatic carrier becoming symptomatic  
Daily probability of an asymptomatic carrier recovering  
Daily probability of an symptomatic carrier recovering  
Daily probability of an symptomatic dying  
Average number of daily infections per asymptomatic carrier  
Average number of daily infections per symptomatic carrier  
Note: The number of daily infections per carrier is measured in a theoritically infinite susceptible population. The real number of infection decreases as the proportion of the population having been infected increases. 