Laurent Jacob

Laurent Jacob

Researcher in machine learning and statistics for genomics

I am a senior CNRS researcher located at the LCQB lab of Sorbonne Université in Paris.

I work on machine learning and statistics to answer questions in genomics. I am particularly interested in microbial genomics, notably to better understand and predict drug resistances, and in evolutionary genomics. Part of my work also aim at better understanding neural networks for biological sequences. See the Projects section for some examples.

I received an MSc in machine learning from ENS Cachan (now ENS Paris-Saclay) in 2006, and a Ph.D. from Mines ParisTech in 2009, where I worked on regularized machine learning for computational biology. Between 2010 and 2012 I was a postdoctoral fellow at the statistics department of UC Berkeley. I joined the CNRS in 2013 and worked at the LBBE lab in Université Lyon 1 until 2023.

You can reach me at firstname (dot) lastname (at) cnrs (dot) fr.

Projects

*
Phyloformer
We train neural networks on sets of homologous sequences simulated from probabilistic evolution models to infer phylogenetic trees. We use self-attention to obtain permutation-equivariant functions and ensure that the obtained tree does not depend on the order in which the sequences are provided. Our hope is to perform inference that is order of magnitude faster than state of the art maximum likelihood, and as accurate or even better under models with untractable likelihoods.
Phyloformer
SEISM
Convolutional neural networks on biological sequences are known to learn probabilistic motifs associated with the target phenotype. To go beyond this informal interpretation, we provide a statistical test quantifying the motif-phenotype association. This requires to solve a post-selection inference problem, where the selection involves the infinite set of possible motifs.
SEISM
DBGWAS
We describe the variation in bacterial genomes through their content in short sequences (k-mers). This avoids the bias usually caused by focusing on pre-defined SNPs or gene lists. We use this representation to pinpoint the genomic variation associated to antimicrobial resitances, and rely on a so-called de Bruijn graph to help make sense of the identified k-mers in terms of SNPs or mobile genetic elements.
DBGWAS
CALDERA
A follow-up on DBGWAS. We use the structure of the de Bruijn graph to define new genomic variants (e.g. gene presence) that accomodate polymorphisms. This amounts to defining one variant for each connected subgraph. We rely on a reverse search strategy to efficiently enumerate variants, and on the Tarone strategy to control the FWER when testing their association with a bacterial phenotype.
CALDERA
Convolutional kernel networks
We analyze convolutional and recurrent neural networks for biological sequences in the framework of positive definite kernels. In addition to providing a new interpretation on the genomic features underlying these networks, this analysis provides a version that is better regularized and performs better when few data is available. We also extended this work to graph-structured data.
Convolutional kernel networks

People

I am grateful to all the people I co-advised. Most of the work featured on this website is theirs.

Ph.D. students

Avatar

Vincent Garot

2023-, with Samuel Alizon and Anna Zhukova.

Avatar

Julie Plantade

2022-, with Xavier Charpentier, Mylène Hugoni and Nicolas Lartillot.

Avatar

Luca Nesterenko

2021-, with Bastien Boussau.

Avatar

Antoine Villié

2019-2023, with Philippe Veber and Yohann de Castro.

Now a Data Science Manager at Aurobac Therapeutics

Avatar

Dexiong Chen

2018-2020, with Julien Mairal.

Now a project leader at MPI Munich

Avatar

Magali Jaillard

2016-2018, with Maud Tournoud and Pierre Mahé.

Now a research scientist at bioMérieux

Postdocs and engineers

Avatar

Luc Blassel

2023-, with Bastien Boussau.

Avatar

Johanna Trost

2021-2023, with Bastien Boussau.

Now an MSc student

Avatar

Xavier Castells Domingo

2015-2016, with Vincent Lacroix and Stéphane Robin.

MSc interns

Avatar

Enzo Marsot

2023, with Bastien Boussau, Théo Tricou and Damien de Vienne.

Now an MSc student at ENS Lyon

Avatar

Louis-Maël Guéguen

2022, with Camille Marchet.

Now a bioinformatician at CHUL in Québec

Avatar

Rosette Von Raesfeldt

2014

Avatar

Elsa Bernard

2012, with Julien Mairal and Jean-Philippe Vert.

Now a group leader at Institut Gustave Roussy

Avatar

Anne-Claire Haury

2009, with Jean-Philippe Vert.

Now a software engineer at Google research

Some recent publications

See my Google Scholar profile for a complete list.
Neural Networks beyond explainability: Selective inference for sequence motifs
CALDERA: Finding all significant de Bruijn subgraphs for bacterial GWAS
A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events

Community

I was recently involved in the organization of several scientific events:

  • LEGEND 2024, a European conference on machine learning for evolutionary genomics.
  • LEGO, a French working group on machine learning for genomics. We organize national meetings to gather researchers interested in this topic.
  • MLMG 2022, an ECML workshop on machine learning for microbial genomics.
  • I co-chaired the Systems Biology and Networks track of the 2021 ISMB conference.
  • The 2020 prospective meeting on data science, AI and biology from the biology and computer science institutes of CNRS.