Applying a statistical model to publicly available genome data offers clues to the beginnings of the coronavirus epidemic in China, say researchers.
Since the current coronavirus epidemic started, scientists and authorities have determined the genetic fingerprint of virus samples from numerous affected countries. More than 100 of these gene sequences, which are present in coronaviruses in the form of RNA, are available in public databases.
Tanja Stadler, professor of computational biology in the department of biosystems science and engineering at ETH Zurich in Basel and an expert in questions of molecular epidemiology, used a statistical model her group developed to analyze the genetic genealogy of pathogens.
“Using statistical methods, we can calculate how many people were infected at any point in time in the past.”
Stadler and colleagues have made their analysis available to other scientists on Virological, an online portal. They point out that their work has not been reviewed by other scientists, as is standard practice in research, as this would take too long in a situation like the current one. Stadler also stresses that the quality of her analysis can be only as good as the quality and quantity of genetic data published. In this study, her team analyzed 93 RNA sequences—most of them from China, with 38 from other countries.
Stadler’s analyses suggest that the epidemic in China began in the first half of November 2019, whereas most previous estimates assumed that the virus did not pass from an animal to the first human until the second half of November.
“The widespread hypothesis that the first person was infected at an animal market in November is still plausible,” Stadler says. “Our data effectively rule out the scenario that the virus circulated in humans for a long time before that.”
Stadler also analyzed the dynamics of the epidemic before the city of Wuhan was quarantined on January 23, 2020. She used the genetic data to calculate the new coronavirus’s basic reproduction number, a figure that indicates the average number of people an infected person goes on to infect. According to Stadler’s estimates, it lies between 2 and 3.5 in the period in question. This corroborates the previous estimates based on the number of confirmed coronavirus cases, which suggested a figure between 2 and 4. What this means is that infections occur much more quickly than with seasonal influenza (which typically has a basic reproduction number below 1.5).
“The basic reproduction number is one of the central parameters of an epidemic,” Stadler says. “It provides important information on the effectiveness of measures such as quarantine. Control measures are effective only if they are able to reduce this number.” That’s why Stadler wants to determine what this number is during the timespan of the Wuhan quarantine. However, she says, the data for this period in Wuhan are unclear, which makes a reliable analysis impossible for now.
Because viral genomes are constantly changing, Stadler could use these changes to reconstruct the evolutionary history of the virus. “Using statistical methods, we can calculate how many people were infected at any point in time in the past,” she explains.
Her analysis shows that on January 23, between 4,000 and 19,000 people must have been infected. At that time there were 581 confirmed cases of the disease. This means that in the most extreme case, only 1 in 33 infected persons appeared in the official statistics; in the best case 1 in 7.
Stadler emphasizes that there are other methods than hers for determining epidemiological parameters. However, her method, which analyses the genomes, has a great advantage in that it allows reliable conclusions to be drawn even with data from relatively few patients. In particular, her method is beneficial in situations where it is no longer clear who infected whom. This is currently the case in Italy, and has been the case in China for some time.
Finally, Stadler’s method would even allow the real-time analysis of an epidemic, which would enable the authorities to continuously review and adjust the effectiveness of control measures. A prerequisite for this would be regular spot checks to examine the viral genome in newly infected persons. However, at present almost no sequence data is being published for new viral genomes from Wuhan.
Stadler will continue her analysis and expand it with newly published genome data.
Source: ETH Zurich