Laurent Jacob

Laurent Jacob

Researcher in machine learning and statistics for genomics

I am a senior CNRS researcher located at the LCQB lab of Sorbonne Université in Paris.

I work on machine learning and statistics to answer questions in genomics. I am particularly interested in microbial genomics, notably to better understand and predict drug resistances, and in evolutionary genomics. Part of my work also aim at better understanding neural networks for biological sequences. See the Projects section for some examples.

I received an MSc in machine learning from ENS Cachan (now ENS Paris-Saclay) in 2006, and a Ph.D. from Mines ParisTech in 2009, where I worked on regularized machine learning for computational biology. Between 2010 and 2012 I was a postdoctoral fellow at the statistics department of UC Berkeley. I joined the CNRS in 2013 and worked at the LBBE lab in Université Lyon 1 until 2023.

You can reach me at firstname (dot) lastname (at) cnrs (dot) fr.


We train neural networks on sets of homologous sequences simulated from probabilistic evolution models to infer phylogenetic trees. We use self-attention to obtain permutation-equivariant functions and ensure that the obtained tree does not depend on the order in which the sequences are provided. Our hope is to perform inference that is order of magnitude faster than state of the art maximum likelihood, and as accurate or even better under models with untractable likelihoods.
Convolutional neural networks on biological sequences are known to learn probabilistic motifs associated with the target phenotype. To go beyond this informal interpretation, we provide a statistical test quantifying the motif-phenotype association. This requires to solve a post-selection inference problem, where the selection involves the infinite set of possible motifs.
We describe the variation in bacterial genomes through their content in short sequences (k-mers). This avoids the bias usually caused by focusing on pre-defined SNPs or gene lists. We use this representation to pinpoint the genomic variation associated to antimicrobial resitances, and rely on a so-called de Bruijn graph to help make sense of the identified k-mers in terms of SNPs or mobile genetic elements.
A follow-up on DBGWAS. We use the structure of the de Bruijn graph to define new genomic variants (e.g. gene presence) that accomodate polymorphisms. This amounts to defining one variant for each connected subgraph. We rely on a reverse search strategy to efficiently enumerate variants, and on the Tarone strategy to control the FWER when testing their association with a bacterial phenotype.
Convolutional kernel networks
We analyze convolutional and recurrent neural networks for biological sequences in the framework of positive definite kernels. In addition to providing a new interpretation on the genomic features underlying these networks, this analysis provides a version that is better regularized and performs better when few data is available. We also extended this work to graph-structured data.
Convolutional kernel networks


I am grateful to all the people I co-advised. Most of the work featured on this website is theirs.

Ph.D. students


Vincent Garot

2023-, with Samuel Alizon and Anna Zhukova.


Julie Plantade

2022-, with Xavier Charpentier, Mylène Hugoni and Nicolas Lartillot.


Luca Nesterenko

2021-, with Bastien Boussau.


Antoine Villié

2019-2023, with Philippe Veber and Yohann de Castro.

Now a Data Science Manager at Aurobac Therapeutics


Dexiong Chen

2018-2020, with Julien Mairal.

Now a project leader at MPI Munich


Magali Jaillard

2016-2018, with Maud Tournoud and Pierre Mahé.

Now a research scientist at bioMérieux

Postdocs and engineers


Luc Blassel

2023-, with Bastien Boussau.


Johanna Trost

2021-2023, with Bastien Boussau.

Now an MSc student


Xavier Castells Domingo

2015-2016, with Vincent Lacroix and Stéphane Robin.

MSc interns


Enzo Marsot

2023, with Bastien Boussau, Théo Tricou and Damien de Vienne.

Now an MSc student at ENS Lyon


Louis-Maël Guéguen

2022, with Camille Marchet.

Now a bioinformatician at CHUL in Québec


Rosette Von Raesfeldt



Elsa Bernard

2012, with Julien Mairal and Jean-Philippe Vert.

Now a group leader at Institut Gustave Roussy


Anne-Claire Haury

2009, with Jean-Philippe Vert.

Now a software engineer at Google research

Some recent publications

See my Google Scholar profile for a complete list.
Neural Networks beyond explainability: Selective inference for sequence motifs
CALDERA: Finding all significant de Bruijn subgraphs for bacterial GWAS
A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events


I was recently involved in the organization of several scientific events:

  • LEGEND 2024, a European conference on machine learning for evolutionary genomics.
  • LEGO, a French working group on machine learning for genomics. We organize national meetings to gather researchers interested in this topic.
  • MLMG 2022, an ECML workshop on machine learning for microbial genomics.
  • I co-chaired the Systems Biology and Networks track of the 2021 ISMB conference.
  • The 2020 prospective meeting on data science, AI and biology from the biology and computer science institutes of CNRS.