Believe it or not, exactly one year after our preprint on predicting epistatic interactions in SARS-CoV-2
went online
we have published a revised version in the journal Genome Biology.
Talk about good timing!
What we wrote one year ago is still a good primer on what epistatic interactions are
and why SARS-CoV-2 is a good test case for developing methods that are able to work
on large datasets.
What did we change with respect to the preprint? The main change is very obvious: we have six (6!) new authors,
which have contributed to run some experiments aimed at validating some of
the predicted interactions.

We decided to focus on interactions that had emerged after the March 2023 cutoff we had from the preprint,
as a way to have a relatively “blind” validation. Our colleagues from Twincore and HZI (with a special
mention to Maureen and Henning!)
then built and tested pseudoviruses with the single and double mutations for their ability to infect
human cells and escape antibodies. We were able to validate all of our hand-picked interactions,
which is a good indication that our predictions are sound and that in theory the method we propose
could become part of the genomic epidemiology “arsenal”.
Another addition of note is a visual representation of how our samples weighting method works,
which explains how we were able to identify the epistatic interaction that gave Omicron
its “super-powers” with as little as seven (7!) Omicron samples (the light orange dot below).

Lastly, a couple of notes on the process of getting this paper published: given that the main author (Gabriel)
did not need to have this work published in the usual way for the benefit of his own career (he had already an offer for a graduate student position), we could experiment a bit more freely.
We decided to use Review Commons to solicit reviews in a journal-agnostic way. We got very stimulating comments
from two anonymous reviewers (attached to the preprint); after we were done
replying to them we could pick where to submit the revised manuscript from a list of affiliated journals. I wish I could tell you which journals
are currently affiliated, but my google-fu came back empty-handed. I really liked the whole idea and process, and deciding which journal might be a good fit
after being done with the first round of peer review feels much easier. We are convinced that Genome Biology has the right readers for the main points
we tried to make. The only downside to the choice of journal would have been its steep APC of 4290 Euros.
Luckily for us we did not have to directly cover this cost, as our institution participates in Projekt DEAL, of which I only had a very vague idea about.
Not only did we use SARS-CoV-2 as a way to peek into the future of genomic epidemiology, but also in that of the research publishing system!
We have just published a research preprint describing
microGWAS, a software pipeline
to facilitate microbial GWAS studies. This effort was
led by Judit, Bamu, and Jenny,
who worked as a team to make my old chaotic code a “production-ready” tool.
It was a real pleasure to see this communal effort take shape!

What we wrote last time we published a method related to microbial GWAS remains a good
simple primer on the subject:
As everyone in the field of genomics has heard ad nauseam, we now have an abundance of
genome sequences available; when that is combined with phenotypic measurements the obvious
question is then “which gene is responsible for this phenotype?”. Statistical genetics (i.e.
genome-wide association studies, GWAS) would be one way to answer that question, or rather
the more correct one “which genetic variant is associated, and hopefully causal, for the
variation in phenotype, across this collection of genomes?”.
The complex nature of microbial genetic variability means that in practice one has to use
a number of software tools to preprocess genomes prior to the actual statistical association analysis.
These tools have each a number of quirks and informal best practices, and very often one needs to write
a small script to connect the output of one tool to the input of the next one in the so-called
“software pipeline”. On top of these diffilcuties, very rarely the “raw” output of the association
analysis provides the information that user needs. Most commonly a user would want to know: 1)
which genes are associated with a given phenotype, and 2) is there a biological process that is
overrepresented in the gene list?
These are the problems that our pipeline hopes to solve! We used the popular Snakemake workflow
management system to connect each individual step of the typical end-to-end microbial GWAS
analysis, including a number of downstream analyses to provide the users with an annotated gene list.

As you can see from the simplified scheme above, the pipeline carries out all of the work needed to go from annotated genome assemblies and
a phenotype table to annotated results and diagnostic plots. We even leverage Snakemake’s support for conda to automate
the cumbersome (and frankly irritating) process of installing and insulating individual tools.
As we want to make this pipeline sustainable in medium term, we have also added a small test dataset to speed up
the developing process; we hope that young and eager researchers in the microbial bioinformatics community
will be interested in contributing to maintain the pipeline and implement new features.
More information about the four (4!) sets of genetic variants that are used in five (5!) distinct associations tests
can be found in the preprint, as well as in the online documentation.
Last month Dilfuza successfully defend her PhD dissertation from the questions of her
two examiners: Ana Rita Brochado and Dan Depledge.
Congratulations to her for pulling this off and be the first PhD student to graduate from our lab!

We are very happy to report that the first PhD thesis from the lab has been submitted this month!
With just one day to spare before the deadline (as it should be 😅), Dilfuza has submitted her
thesis to the ZIB office. Now we wait for the public defense in June.

In other news, last month Adam has officially left the lab to take an exciting new job as a postdoc
in the lab of Craig MacLean at the University of Oxford. Luckily Adam made a big push
before leaving and finished some large scale experiments thanks to his usual stamina, which will be
missed!

Congratulations to Dilfuza and Adam for these exciting news!
As anticipated in the previous post,
we intended to run the Hannover half marathon, and indeed we did!
We all managed to get to the finish line, with a spatial mention to Hannes, who completed
the race in 01:59:52, with just 8 seconds to spare for the 2-hour psychological
barrier!

We didn’t quite manage to get a photo on the day of the race, but we are happy to report
that we collected 366 Euros through our DKFZ donation campaign. Thanks to all who donated!