Publications

Publications in reversed chronological order. For complete list, see Google Scholar.

2025

The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton

Natalie Abreu, Nikhil Vyas, Sham Kakade, Depen Morwani

Accepted to ICLR 2026

arXiv

Seesaw: Accelerating Training by Balancing Learning Rate and Batch Size Scheduling

Alexandru Meterez, Depen Morwani, Jingfeng Wu, Costin-Andrei Oncescu, Cengiz Pehlevan, Sham Kakade

Accepted to ICLR 2026

arXiv

LOTION: Smoothing the Optimization Landscape for Quantized Training

Mujin Kwun, Depen Morwani, Chloe Huangyuan Su, Stephanie Gil, Nikhil Anand, Sham Kakade

Under submission at ICML 2026

arXiv

Adam or Gauss-Newton? A Comparative Study In Terms of Basis Alignment and SGD Noise

Bingbin Liu, Rachit Bansal, Depen Morwani, Nikhil Vyas, David Alvarez-Melis, Sham M. Kakade

Under submission at ICML 2026

arXiv

Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants

Depen Morwani, Nikhil Vyas, Hanlin Zhang, Sham Kakade

Under submission at ICML 2026

arXiv

A Simplified Analysis of SGD for Linear Regression with Weight Averaging

Alexandru Meterez, Depen Morwani, Costin-Andrei Oncescu, Jingfeng Wu, Cengiz Pehlevan, Sham Kakade

Optimization for Machine Learning (OPT) Workshop, 2025

arXiv

2024

A New Perspective on Shampoo's Preconditioner

Depen Morwani, Itai Shapira, Nikhil Vyas, Eran Malach, Sham Kakade, Lucas Janson

Accepted to ICLR 2025

arXiv

Deconstructing What Makes a Good Optimizer for Language Models

Rosie Zhao, Depen Morwani, David Brandfonbrener, Nikhil Vyas, Sham Kakade

Accepted to ICLR 2025

arXiv

SOAP: Improving and Stabilizing Shampoo using Adam

Nikhil Vyas, Depen Morwani, Rosie Zhao, Itai Shapira, David Brandfonbrener, Lucas Janson, Sham Kakade

Accepted to ICLR 2025

arXiv

How Does Critical Batch Size Scale in Pre-training?

Hanlin Zhang, Depen Morwani, Nikhil Vyas, Jingfeng Wu, Difan Zou, Udaya Ghai, Dean Foster, Sham Kakade

Accepted to ICLR 2025

arXiv

2023

Feature emergence via margin maximization: case studies in algebraic tasks

Depen Morwani, Benjamin L. Edelman, Costin-Andrei Oncescu, Rosie Zhao, Sham Kakade

Accepted as Spotlight (Acceptance Rate 5%) to ICLR 2024

arXiv

Simplicity Bias in 1-Hidden Layer Neural Networks

Depen Morwani, Jatin Batra, Prateek Jain, Praneeth Netrapalli

Accepted to NeurIPS 2023

arXiv

Feature-Learning Networks Are Consistent Across Widths At Realistic Scales

Nikhil Vyas, Alex Atanasov, Blake Bordelon, Depen Morwani, Sabarish Sainathan, Cengiz Pehlevan

Accepted to NeurIPS 2023

arXiv

Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

Nikhil Vyas, Depen Morwani, Rosie Zhao, Gal Kaplun, Sham Kakade, Boaz Barak

Accepted as Spotlight (Acceptance Rate 3.5%) to ICML 2024

arXiv

2022

Inductive bias of gradient descent for weight normalized smooth homogeneous neural nets

Depen Morwani, Harish G. Ramaswamy

Accepted to ALT 2022

arXiv