Marco Galardini
04 May 2026
We have just posted a new preprint, describing our work on improving how to detect bacterial transmission in hospitals using genomics.
This effort was led by Judit, who worked in collaboration with colleagues from
Hannover Medical School and Copenhagen University Hospital-Rigshospitalet,
to use a large (~30k genomes!) bacterial genomics dataset that our collaborator Susanne Häußler
has put together over the years.
Bacterial infections are a much too common “perk” for patients being hospitalized, and it’s the job of
epidemiologists to identify the transmission routes for these pathogens. Often this problem boils down
to a relatively simple question: is this particular pair of bacterial samples related to each other? Genomics, used to
read all the millions of letters in the bug’s DNA, provides the highest possible resolution
to answer this question. But how to choose the number of genetic differences (“SNPs”) that separate related (i.e. the same bug moving between patients) from unrelated samples?
The standard practice in the field has relied on a fixed SNPs threshold (e.g. 20), which generally works, but has two problems:
it’s rather arbitrary, but most importantly it does not take into account the impact of time. Since every time a bacterial cell
duplicates there is a chance that errors (i.e. SNPs) are introduced, then samples that are farther apart in time can
be expected to have more SNPs separating them. But how can we calibrate such a “SNPs accumulation clock”?
Judit had the brilliant intuition that in the dataset we had ~50 patients that had been sampled multiple times (~20!) over their hospital
stay. She could then calibrate our empirical clock within the same dataset we would use the clock for.
We hope that this approach will be taken up by genomic epidemiologists in their daily practice.

Once we had used our calibrated clocks to identify transmitting bugs, we wanted to know if we could identify genetic
characteristics that could differentiate them from non-transmitting ones. We used two approaches to answer this question,
one using lists of known genes, and one looking at the whole “haystack”.
Even though we could identify many genes and genetic variants associated with the ability to transmit between patients,
we failed to use them to predict which samples were part of a transmission chain in a held-out dataset. This suggests
that patient and environment factors might dominate the probability of bacterial transmission. Measuring and including
these factors in future analysis may then lead to a system that is better for predictions.

There’s more to discover in the preprint and the accompanying code repository,
so please dig in!
Marco Galardini
29 April 2026
We are very happy to report that the fifth (!) PhD thesis from the lab has been submitted this month!
As it is customary with many graduate students, Hien submitted her thesis one day before the deadline;
but this time I am to blame for pushing her to finish one last experiment before completing her thesis.
We are all very excited about her hard work and are now looking forward to the public defenses in June.

Congratulations to her!
Marco Galardini
06 November 2025
Update (2025-12-16): the position has been provisionally filled. Thanks to all who applied!
After a rather long hiatus we are hiring again!
We are looking for a candidate to fill a computational PhD student position,
with the main task of better understanding which genetic elements (“genes”)
make bacterial pathogens such as E. coli, K. pneumoniae, P. aeruginosa, and E. faecium
virulent and resistant to antibiotics. We have in fact developed methods (here
and here)
to sift through large numbers of bacterial genomes for this exact purpose (see here,
here,
and here), and it will be the job
of the candidate to use and improve these methods so that they can be applied to
an even larger number of genomes. We are also particularly interested in implementing
more machine learning methods and to integrate molecular phenotypes such as gene expression
and proteomics.
Our lab is part of the RESIST excellence cluster at
Hannover Medical School (MHH),
and we are part of a collaborative project with Susanne Häussler
and Meike Stiesch to identify genetic determinants of pathogenicity
and virulence in life-threatening bacterial infections.
We are looking for a candidate with a strong computational background and relevant technical skills:
- Strong background in computational biology, bioinformatics, computer science, or a related field.
- Experience with programming languages such as Python and workflow management systems such as Snakemake.
- Familiarity with machine learning techniques and libraries (e.g., scikit-learn, TensorFlow, PyTorch).
- Knowledge of bacterial genomics and related bioinformatic tools.
- Experience with high-performance computing (HPC).
- Experience with version control systems (e.g., git) and software development best practices.
- Experience with statistical analysis and data visualization.
We offer a fully-funded PhD position for a little longer than 3 years, to begin in February 2026.
The student will be enrolled in the Biomedas graduate program,
which offers curriculum specifically designed for
computational biology students.
The lab is located at Twincore, embedded in a large scientific campus that include Hannover Medical School,
and the Center for Individualized Infection Medicine (CiiM); we are also part of the
Helmholtz Centre for Infection Research (HZI), which offers lots of opportunities
for collaboration.
Hannover is the capital of Lower Saxony, in northern Germany, and is
an affordable
and vibrant city.
Please apply as soon as possible by sending the following documents to Marco Galardini at marco.galardini@twincore.de:
- Your CV
- A short motivation letter
- The link to a software portfolio (e.g., your GitHub account)
- Contact details of two references

Marco Galardini
01 July 2025
Yesterday we had the great pleasure to see not only one, but two of our students defending their PhD dissertations!
We started off with Judit, who focused on the use of large genome collections to study the transmission of bacterial pathogens
in the clinic and on methods to improve current practices for genomic epidemiology. She sustained a rather long Q&A session
from her examiners: Conor Meehan and
Dirk Schlüter.
Then it was time for Hannes to talk about his work on using k-mer based methods to improve the interpretability of bacterial
GWAS and aid the design of antimicrobials based on antisense oligonucleotides (i.e. asobiotics), using very large genome collections.
He had to contend with a similarly though Q&A session from his examiners:
Franziska Faber and
Daniel Depledge, who is by now a regular examiner for our PhD students.

After a delicious BBQ organized at our department by them, we headed back to MHH for the graduation ceremony, where we had a pleasant surprise:
Judit won the “Infection Biology Prize” for her thesis, which comes with a cool check for 1000 Euros!

Thanks to both for their very hard work and lovely day of celebrations!
Marco Galardini
10 June 2025
People familiar with the field of bacterial genomics have long been aware that microbial
genomes are densely packed with genes, and thus are depleted of so-called “junk DNA” (a term that has fallen out
of fashion by the way!).
As a result, the more abundant protein coding portions of these genomes get the most attention
from researchers aiming to find which genetic variants explain phenotypic variation among isolates.
Previous work has however already shown that bacterial non-coding regions are both highly diverse
and show signals of being evolutionarly constrained. We also knew that these regions influence
the expression of genes encoded directly downstream from them. We therefore hypothesized
that we could uncover statistical associations between genetic variants in non-coding regions
and gene expression variability across isolates.
The results of this work have just been published as a preprint,
a work that was led by Bamu during her time as a PhD student
in the lab. Bamu was the very first person brave enough to join the lab, and has manged to
work both in the dry- and wet-lab, a feat that not many people can achieve!
Bamu indeed found that it was possible to identify at least one genetic variant whose presence
was associated with gene expression changes in up to 39% of tested genes in two important
bacterial pathogens (E. coli and P. aeruginosa). Using the right way to represent the
complex genetic variation (i.e. gene-centric k-mers)
allowed Bamu to capture the highest proportion of associations.

Once we found these associations, the next task would be to validate some of them and to understand the actual mechanism
operating behind the scene. Here Bamu used a combination of in-silico and in-vitro approaches, which very clearly
indicated that no single mechanism would be sufficient to explain the observed associations.
The last part of the study was instead dedicated to the understanding of the role of non-coding
genetic variation to antimicrobial resistance. Again, Bamu used her dry- and wet-lab skills to show that
indeed there are non-coding variants in both species that are associated with antimicrobial resistance.
This leads us to conclude that these often neglected regions of the bacterial genome need to be
taken into account if we want to be eventually able to make the most out of bacterial genomes.