Publications

You can also find my articles on Google Scholar

Where Do Large Learning Rates Lead Us?

Ildus Sadrtdinov*, Maxim Kodryan*, Eduard Pokonechny*, Ekaterina Lobacheva^†, Dmitry Vetrov^†
Neural Information Processing Systems (NeurIPS), 2024
arXiv / openreview / poster & video / X thread / code / bibtex

We show that only a narrow range of large LRs is beneficial for generalization and analyze it from the loss landscape and feature learning perspectives.

Where Do Large Learning Rates Lead Us? A Feature Learning Perspective

Ildus Sadrtdinov, Maxim Kodryan, Eduard Pokonechny*, Ekaterina Lobacheva*, Dmitry Vetrov*
ICML 2024 Workshop HiLD, 2024
pdf / openreview / X thread / bibtex

We study feature learning properties of training with different initial LRs. We show that a narrow range of optimal initial LRs learns a sparse set of the most useful features. At the same time, smaller LRs do not have such specialization, while larger LRs fail to extract useful patterns from data.

To Stay or Not to Stay in the Pre-train Basin: Insights on Ensembling in Transfer Learning

Ildus Sadrtdinov*, Dmitrii Pozdeev*, Dmitry Vetrov, Ekaterina Lobacheva
Neural Information Processing Systems (NeurIPS), 2023
arXiv / openreview / poster & video / X thread / code / bibtex

We study the effectiveness of the exploration of the pre-train basin and its close vicinity for ensembling in transfer learning. We show that ensembles trained from a single pre-trained checkpoint may be improved by better exploring the pre-train basin, while leaving the basin results in degradation of the ensemble quality.

[Re] “Towards Understanding Grokking”

Alexander Shabalin*, Ildus Sadrtdinov*, Evgeniy Shabalin
ML Reproducibility Challenge 2022, 2023
Outstanding Paper Honorable Mention
pdf / openreview / code / bibtex

We successfully reproduce results of the paper “Towards Understanding Grokking: An Effective Theory of Representation Learning”. We investigate the consistency of training phases depending on data and weight initialization and propose smooth phase diagrams.

On the Memorization Properties of Contrastive Learning

Ildus Sadrtdinov, Nadezhda Chirkova, Ekaterina Lobacheva
Workshop on Overparameterization: Pitfalls & Opportunities at ICML, 2021
arXiv / bibtex

We study how different training paradigms (supervised learning, self-supervised learning, and training with random labels) learn training examples. We show that memorization of self-supervised algorithm (SimCLR) is similar to training with random labels.