Dmitry Kobak

I am a group leader in the Department of Data Science of the Hertie AI institute at Tübingen University, Germany, working on machine learning and data science for biological applications.

I am interested in self-supervised and unsupervised learning, in particular contrastive learning, manifold learning, and dimensionality reduction for 2D visualization of scientific datasets. I am working with image data, text data, graph data, and single-cell RNA-seq data in neuroscience contexts.

I am also interested in statistical forensics and have been involved in the analysis of Russian electoral falsifications, war fatalities, and Covid-19 excess mortality.

I am a Privatdozent at the Faculty of Computer Science. In 2023/24 winter semester I was a Vertretungsprofessor (visiting professor) at Heidelberg University, Germany. I am a member of the ELLIS society, a member of the Cluster of Excellence «Machine Learning for Science», and an IMPRS-IS associated scientist.

Teaching

For several years I have been teaching an introductory course on machine learning for MSc students in neuroscience and data science in Tübingen. In winter semester 2020/21, due to the Covid pandemic, the class was held online and the lectures were recorded in a studio.

Videos: Tübingen Machine Learning / Introduction to Machine Learning
Slides: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11

Winter semester 2023/24 (Heidelberg): BSc course Einführung ins Machinelle Lernen (in German), MSc seminar Transformers, large language models, and their use in physics.

Talks

Contrastive and neighbor embedding methods for data visualisation, April 2023, Network Seminar.
Контрастные методы и методы ближайших соседей для визуализации данных, May 2023, SciBerloga Seminar.
What are 2D neighbour embeddings of scRNA-seq data actually useful for?, April 2022, ICLR GTRL workshop.
Neighbour embeddings for scientific visualization, June 2021, ELLIS Life Heidelberg.

Supervision / team

Postdocs:

Sebastian Damrich (2023–2025) → junior group leader at Hertie AI

PhD students:

Rita González Márquez (started in 2022)
Niklas Böhm (2021–2025) → postdoc in European Space Agency

MSc students:

Moritz Christ (2025)
Marius Keute (2023)
Fynn Bachmann (2021) → PhD student at the University of Zürich
Rita González Márquez (2021) → PhD student
Niklas Böhm (2020) → PhD student

Research

This is a list of [co-]first-/last-author papers grouped by topic; see my Google Scholar page for the complete list ordered chronologically or by citation count. Most papers are open access; for the ones that are not I provide PDFs. Twitter icons link to the respective Twitter threads (tweeprints).

(NeurIPS 🥳 2024)

Statistical methods for transcriptomic and multi-omic data analysis

Lause et al., Compound models and Pearson residuals for normalization of single-cell RNA-seq data without UMIs (bioRxiv, 2023)
Lause et al., Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data (Genome Biology, 2021)
Bernaerts et al., Sparse bottleneck networks for exploratory analysis and visualization of neural Patch-seq data (arXiv, 2020)
Kobak et al., Sparse reduced-rank regression for exploratory visualisation of paired multivariate data (Journal of Royal Statistical Society C, 2021)

Machine learning & statistical theory

Greydanus & Kobak, Scaling down deep learning with MNIST-1D (ICML 🥳 2024)
Kobak et al., The optimal ridge penalty for real-world high-dimensional data can be zero or negative due to the implicit ridge regularization (JMLR, 2020)

Patch-seq data analysis

Scala* & Kobak* et al., Phenotypic variation in transcriptomic cell types in mouse motor cortex (Nature, 2021)
Scala* & Kobak* et al., Layer 4 of mouse neocortex differs in cell types and circuit organization between sensory areas (Nature Communications, 2019)

Election forensics

Kobak et al., Suspect peaks in Russia's “referendum” results (Significance, 2020)
Kobak et al., Putin's peaks: Russian election data revisited (Significance, 2018)
Kobak et al., Statistical fingerprints of electoral fraud? (Significance, 2016)
Kobak et al., Integer percentages as electoral falsification fingerprints (The Annals of Applied Statistics, 2016)

Excess mortality / Covid-19 forensics / LLM forensics

Kobak et al., Delving into LLM-assisted writing in biomedical publications through excess vocabulary (Science Advances 🥳 2025)
Kobak et al., War fatalities in Russia in 2022–2023 estimated via excess male mortality: a research note (Demography, 2025)
Kobak, Underdispersion: A statistical anomaly in reported Covid data (Significance, 2022)
Karlinsky & Kobak, Tracking excess mortality across countries during the COVID-19 pandemic with the World Mortality Dataset (eLife, 2021)
Kobak, Excess mortality reveals Covid's true toll in Russia (Significance, 2021)

Previous work and education

In 2013–2016 I was a postdoc in the Machens lab at Champalimaud Institute in Lisbon, working on statistical analysis of electrophyisological population recordings from the cortex.

Kobak* & Pardo-Vazquez* et al., State-dependent geometry of population activity in rat auditory cortex (eLife, 2019)
Kobak* & Brendel* et al., Demixed principal component analysis of neural population data (eLife, 2016)

In 2007–2012 I did my PhD in the Mehring lab, initially at Freiburg University and later at Imperial College London, working on computational motor control.

Bashford* & Kobak* et al., Motor skill learning decreases movement variability and increases planning horizon (Journal of Neurophysiology, 2022)
Kobak & Mehring, Adaptation paths to novel motor tasks are shaped by prior structure learning (The Journal of Neuroscience, 2012)

In 2000–2007 I studied computer science (BSc) in St. Petersburg ITMO University and then theoretical physics (MSc) in St. Petersburg State University.

Before that, I attended St. Petersburg Classical Gymnasium #610. I was part-time teaching there computer science and physics in 2002–2006 while studying in university. In 2004, together with a friend, I made a website 610.ru that is still online (with minor changes).

Reviewing

Year  Reviews
-------------
2019  1
2020  15
2021  20
2022  22
2023  28
2024  23
2025  25
2026  1

Venue              Reviews (>1)
-------------------------------
NeurIPS            31
ICML               18
ICLR               13
ECML               12
TMLR               12
Bioinformatics     5
Genome Biology     5
AISTATS            3
JMLR               3
Nature Biotech     2
Nature Comms       2
Nature Comms Bio   2
Nature Methods     2
PLoS Comp Bio      2
Political Analysis 2

Review lengths in kB
--------------------
 0 | 
 1 | .................
 2 | ....................
 3 | .......................................
 4 | ........................
 5 | ...............
 6 | ............
 7 | ....
 8 | ...
 9 | .

Area chair: ICLR 2026, ICML 2026
Action editor: TMLR (from 2025)

Last updated: December 15, 2025