All These Mutant Virus Strains Need New Code Names
As potentially more dangerous variants of Covid-19 spread, scientists are taking a crack at giving them clearer names that’ll help in the fight. Another very interesting piece of The Wired magazine one SARS-CoV-2 variants' names
IT’S NOT TULIO de Oliveira’s fault that every mutated variant of the virus that causes Covid-19 has more than one name, much less that they all read like hack-proof passwords. But de Oliveira might have inspired a new, international push to change that system.
A bioinformatician and director of the KwaZulu-Natal Research and Innovation Sequencing Platform at the University of KwaZulu-Natal in South Africa, de Oliveira leads the team that in December identified one of those variants—a version of SARS-CoV-2 with a mutation that seemed to make it more easily transmitted from person to person. In a pandemic, that’s bad.
De Oliveira did what you’re supposed to—he shared what he knew about the genetic makeup of this new variant with the thousands of scientists around the world trying to beat the disease. One of the best ways to do that is to understand how any given variant relates, evolutionarily and epidemiologically, to all the others—to figure out why some mutations might confer the ability to spread faster and more easily, perhaps while evading some vaccine formulations. His team sequenced the viral variant’s genes and uploaded the results to a database called GISAID, the 'Global Initiative on Sharing All Influenza Data.'
If you’ve heard about the viral offshoot de Oliveira’s team found, it might be as 'the variant first identified in South Africa.' But that’s not how scientists name things. To them, it was B.1.351 (pronounced 'bee dot one dot three five one').
This is where de Oliveira contributed to making things more complicated. See, researchers in the United Kingdom had also found a rapidly spreading variant, which they'd named SARS-CoV-2 VOC 202012/01, a less-than-felicitous way of saying it was the first 'Variant of Concern' identified in December of 2020. Its key mutation seemed to be the same as as in the variant de Oliveira had found, a change technically called 'N501Y.' But one way to classify variants is according to their mutations. So things were about to get confusing.
But de Oliveira had a pitch. 'It became clear that the variants had a different role in the classification,' de Oliveira tells me. He proposed that the UK researchers' variant would be called 501Y.V1 (and also B.1.1.7). De Oliveira's would still be B.1.351, and also 501.V2. You might have also heard of another variant called 501Y.V3 … or one called P.1 … or another called CAL.20C. Maybe you’re also trying to keep track of more genetic mutations, like E484K (also known as 'Eeek') or D614G, 'Doug.' They’re funny names that belie some bad, bad things.
Scientists don’t like to name diseases after places or people (too stigmatizing). They prefer something more precise—and more esoteric. What a virus is called is also, often, what it does, or where it fits on the family tree. Scientists argue about nomenclature, about the naming of things, because it’s a proxy for fighting about what things are—methodologically, objectively, and philosophically. But in the middle of a pandemic, that might not be good enough. 'It’s too complicated,' Maria Van Kerkhove, head of the World Health Organization’s emerging diseases unit, said in a Q&A at the end of January. 'We don't actually have to name every variant that’s of interest, but we do need to name the ones that are important, the ones that have potential impact on severity, on transmission, and any of them that have any impact on therapeutics and vaccines.' So with all the bigwigs of Covid nomenclature, the WHO has begun to try to untangle the naming mess.
No epidemic has ever had so many people sequencing so many samples of a virus. There was bound to be a pile-up. 'It might seem like it burst onto the public stage. But in the scientific community, this discussion about, ‘How do we talk about it? What nomenclature do we use?’ has been brewing for a while,' says Emma Hodcroft, a molecular epidemiologist at the University of Bern and co-developer of Nextstrain, one of the main efforts to organize viral genetic sequences. 'A lot of it does depend on what you’re doing. Are you doing public health intervention or large scale evolution?'
GISAID started in 2008, after researchers around the world expressed some reticence at putting sequence data from their surveillance of bird flu into public domain databases. Under-resourced scientists didn’t want to drop a new sequence but then get scooped on the analysis by some other researcher with a zillion-dollar lab. And as GISAID got more and more data, the people who ran it had to come up with a way to identify each sequence and put them all into context with one another. Now it’s the main data repository for SARS-CoV-2 genomes.
But the world of Covid nomenclature has two more great and noble houses. Nextstrain, based at the Fred Hutchinson Cancer Research Institute and University of Basel, is one. Its organization revolves around clades, big branches on the phylogenetic tree of life. (Nextstrain started out doing the same job for influenza.) Its names have a cheat code—clades are organized by the year they’re discovered and a letter of the alphabet, and then according to specific mutations of interest. The de Oliveira team’s variant had a bunch of mutations, but the N501Y was important. (The mutation changes an asparagine, abbreviated with the letter N, to tyrosine, abbreviated with a Y, at the 501st amino acid on the virus’ spike protein, in the RBD (that’s Receptor Binding Domain) that attaches to the human ACE2 receptor (that’s Angiotensin-Converting Enzyme).
Easy, right? (Ahem.) But then things got even more complicated. The one the UK researchers were seeing had the same mutation, among many others. To distinguish it from de Oliveira’s, each got a new designation—appending 'V1' on the one from the UK and 'V2' on the other. Another similar variant that led back to Manaus, in Brazil, came to be 'v3.'
'We’re not trying to name everything. In fact, we’re really explicitly trying not to have more than 10 or 20 names a year, and we’re interested in picking out the most important things,' Hodcroft says. 'That’s, like, big changes in the tree. When we see groups that are different in their genetics and they spread, even if it takes a while, in a region or around the world, we give those a Nextstrain clade.'
That’s not what the other bigwig in the space does, though. It’s analytical software called Pangolin—'Phylogenetic Assignment of Named Global Outbreak LINeages.' So-called Pango lineages start with a letter, initially A or B, designating the first two diverging SARS-CoV-2 sequences that emerged from China in late 2019 and early 2020. Each generation gets a number, and its descendants get an additional number, preceded by a period—but only for three generations. Four or more, and the whole lineage gets assigned to a new letter. Imagine an Obed-begat-Jesse-and-Jesse-begat-David vibe, but with diagrams and genomic receipts. 'Lineages are operating on a different resolution. You can have very big ones and small ones, but the idea is to capture the emerging edge of the pandemic,' says Áine O’Toole, an evolutionary biologist at the University of Edinburgh who created Pangolin and is now one of its main developers. 'The idea is to have a cluster of sequences that is linked to some sort of epidemiological piece of information.'
(After publication, O’Toole emailed me to note that while she had created the Pangolin software, she didn’t come up with the Pango notation used in the nomenclature—that was a bigger team. It's an important distinction that also proves my point about how hard it is to name things, including the people who name things.)
Pangolin has a tricky bit. Anyone working on a viral genome can use the software to try to figure out whether they have something new, and where it might fit with all the known lineages (with data pulled from GISAID, just as Nextstrain does). But making a final call on whether a strain is indeed new, and deserves a different spot in the heuristic—its Pango lineage—is up to actual living people on the team and suggestions from scientists in the field. 'I think maybe it’s something we need to work harder on, to try to convey there’s a difference between lineage designation and lineage assignment,' O’Toole says. 'When we designate lineages, that’s just based on what we know. If you’ve got a new lineage and we haven’t seen it, Pangolin won’t be able to assign it, because it can’t predict lineages that will arise in the future. So there is a lag.'
Pango lineages are meant to be dynamic, to change with new data. But not everyone gets that. That’s why the variant that seems to be able to reinfect people who’ve already had Covid-19, the one making the most noise in Manaus, could've gotten a 'B' name but now gets called 'P.1.' (Confusing the issue further, it’s still 501Y.V3, even though its key mutation is probably E484K.)
Scientific language always balances specificity with clarity. Even the words scientists use for all these variants can get kind of slippery. A mutant is a living thing, or a virus, that has in its genetic code mutations, differences from the 'wild type' genotype of the organism, acquired via errors and selective pressures. A variant is a mutant, and technically a strain is a variant with a markedly different phenotype, or manifestation—in this case, in how transmissible it is, the severity of the disease it can cause, or how it interacts with the immune system and, by extension, a vaccine. US infectious disease expert Anthony Fauci keeps talking about 'dangerous mutants' at Covid-19 briefings, which is probably a more vernacular usage but also makes one fear for the safety of the X-Men.
Even naming the virus itself wasn’t without controversy. It acquired 'Severe Acute Respiratory Syndrome Coronavirus 2,' and the disease got named 'Coronavirus Infectious Disease, 2019,' in part to avoid political and racist appelations like 'Chinese virus' or, can you even believe it, 'Kung Flu.' And people still call the virus 'Covid.' (That’s an error, of course; actually, Covid is the disease. SARS-CoV-2 is the monster.) And that sensitivity didn’t spare the beermaker ABInBev from jokes about its brand Corona.
Those slightly rocky alphanumeric coding systems weren’t meant for public consumption. They were designed for use in journal articles, maybe in PowerPoint presentations, not for news anchors or government briefings. If you say 'bee dot one dot one dot seven,' that’s correct but clunky. But if you shorten it to 'bee one one seven,' that could be B.1.1.7 or B.1.17. Totally different mutants! 'If I’m talking to a technical group, these are all useful ways of describing these viruses to that audience and making sure we’re on the same page,' says Duncan MacCannell, chief science officer in the Office of Advanced Molecular Detection at the Centers for Disease Control and Prevention. 'If I’m trying to designate a variant of concern, though, and talk about that to a lay audience or the public or the media, they can be really challenging. They are really specific. They refer to what we’re talking about. But they’re also hard to articulate sometimes.'
So people have already started to do what they’re absolutely not supposed to—default to nicknaming the variants after the place where scientists first found them. That’s where you get the 'UK variant' or the 'Southern California variant.' The names tar those places with a certain viral negativity—and, worse, potentially discourage researchers and governments from being transparent about future surveillance efforts. And, to be clear, there aren’t any enforceable rules that say scientists have to use any of these systems.
That’s why the WHO has started organizing meetings to try to rationalize some of this—not necessarily to replace any of the nomenclatures, but maybe to come up with a new, more media-friendly way to talk about them. That might mean, as de Oliveira has suggested, just shorthanding new variants of concern as V1, V2, and so on. Hodcroft tells me she’s been hearing about ideas to make up brand new, easy-to-pronounce words to use as names. 'I can understand this is a little annoying,' Hodcroft says. 'I think the main thing that might have happened with the variants of concern is that it’s hard for the media to keep up.'
So, sure, fine, it’s my fault. But as intervention levels fluctuate and the uneven distribution of vaccines against Covid-19 continues, we humans are applying different selective pressures to the virus in different places. It’s responding with directional evolution, different strains coalescing around a set of predictable and potentially dangerous mutations. Variants gonna vary. So as Nature reported in January, those WHO meetings have now taken on a new urgency. Last week, 'for the first time all the three main classifications came to present their rationale. That will be followed by more meetings where we try to come up with a consensus. But the way to generate a consensus is first to listen to each other and try to find common ground,' de Oliveira says. 'You have to know your enemy in order to fight. That’s why we want to avoid country or geographic stigmatization. If people don’t describe the variant, even if it’s negative for the country, you cannot design effective interventions.' Beating SARS-CoV-2 has been hard enough already; it'll be a little easier if everyone knows what everyone else is talking about.
Adam Rogers writes about science and miscellaneous geekery. Before coming to WIRED, Rogers was a Knight Science Journalism Fellow at MIT and a reporter for Newsweek. He is the author of The New York Times science bestseller Proof: The Science of Booze.
News date: 2021-02-08
Emergence of a SARS-CoV-2 variant of concern with mutations in spike glycoprotein. Tegally H, Wilkinson E, Giovanetti M, Iranzadeh A, Fonseca V, Giandhari J, Doolabh D, Pillay S, San E, Msomi N, Mlisana K, Gottberg A, Walaza S, Allam M, Ismail A, Mohale T, Glass A, Engelbrecht S, Zyl G, Preiser W, Petruccione F, Sigal A, Hardie D, Marais G, Hsiao M, Korsman S, Davies M, Tyers L, Mudau I, York D, Maslo C, Goedhals D, Abrahams S, Laguda-Akingba O, Alisoltani-Dehkordi A, Godzik A, Wibmer Cos, Sewell B, Lourenco J, Alcantara Ls, Kosakovsky Pond S, Weaver S, Martin D, Lessells R, Bhiman J, Williamson C, de Oliveira T, Nature (2021), https://doi.org/10.1038/s41586-021-03402-9:.