Transformers, large language models, and their use in physics

This seminar is offered for the MSc students of the Physics and Astronomy department of the Heidelberg university (Computational Physics specialisation). The seminar (SWS: 2; Leistungspunkte: 6; LSF) will be held for the first time in the winter semester 2023/24. The language of the seminar is English. To get credits, participants have to prepare one polished presentation and later submit a written report.

The teacher: Dr Dmitry Kobak is a group leader at the Tübingen University working on dimensionality reduction and self-supervised learning. During the winter semester 2023/24, he is a visiting professor in Heidelberg.

When/where: Thursday 14:15–15:45. INF 205, SR 11.

Schedule

Nov 9: Andrij Karpathy, Let's build GPT, part 1
Nov 16: Andrij Karpathy, Let's build GPT, part 2
Nov 23: Subbarao Kambhampati, Avenging Polanyi's Revenge
Nov 30: Jyot Makadiya: Evaluating cognitive abilities of LLMs
Dec 7: Ken von Buenau: Training data memorization in LLMs
Dec 14: Pit Neitemeier: Mechanistic understanding of LLMs: sparse coding
Jan 11: Johnly Joshy: Mechanistic understanding of LLMs: attention circuits
Jan 18: Aditya Rastogi: ViT, CLIP, and MedCLIP
Jan 25: Philip Velie: Transformers for jet tagging at LHC
Feb 1: Johannes Schmidt: Transformers for quantum chemistry

Some possible topics

Introduction
- Transformer architecture
  - Bloehm, Transformers from scratch
  - Karpathy, Let's build GPT: from scratch, in code, spelled out
LLM intelligence
- Emergent abilities of LLMs
  - Srivastava et al, Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
  - Wei et al, Emergent Abilities of Large Language Models
- Sparks of AGI
  - Bubeck et al, Sparks of Artificial General Intelligence: Early experiments with GPT-4
- GPT-4 cannot self-critique / plan
  - Stechly et al., GPT-4 Doesn't Know It's Wrong: An Analysis of Iterative Prompting for Reasoning Problems
  - Valmeekam et al, Can Large Language Models Really Improve by Self-critiquing Their Own Plans?
- Systematic cognitive evaluation
  - Mommenejad et al., Evaluating Cognitive Maps and Planning in Large Language Models with CogEval
- World models debate
  - Gurnee & Tegmark, Language Models Represent Space and Time (cf. Novembre et al, Genes mirror geography within Europe and Mikolov et al, Distributed Representations of Words and Phrases and their Compositionality)
- Training data memorization
- In-context learning
  - Lu et al, Are Emergent Abilities in Large Language Models just In-Context Learning?
Understanding LLMs
- Mechanistic understanding via sparse coding
  - Bricken et al, Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
  - Cunningham et al, Sparse Autoencoders Find Highly Interpretable Features in Language Models
- Mechanistic understanding of attention layers
  - Elhage et al, A Mathematical Framework for Transformer Circuits
- Grammar learning
  - Allen-Zhu & Li, Physics of Language Models: Part 1, Context-Free Grammar
Other LLM topics
- Tiny LLMs
  - Eldan & Li, TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
- Time series modelling using LLMs
  - Gruver et al, Large Language Models Are Zero-Shot Time Series Forecasters
- Philosophy
  - Bottou & Schoelkopf, Borges and AI
Transformers in other domains
- Vision transformers
  - Dosovitstkiy et al, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
  - Darcet et al, Vision Transformers Need Registers
- Transformers for LHC data analysis
  - Qu et al, Particle Transformer for Jet Tagging
- Transformers in quantum chemistry
  - Zhang et al, M-OFDFT: Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning