Publications

Publications in reversed chronological order. For complete list, see Google Scholar.

2025

  1. The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton
    Natalie Abreu, Nikhil Vyas, Sham Kakade, Depen Morwani
    Accepted to ICLR 2026
  1. Seesaw: Accelerating Training by Balancing Learning Rate and Batch Size Scheduling
    Alexandru Meterez, Depen Morwani, Jingfeng Wu, Costin-Andrei Oncescu, Cengiz Pehlevan, Sham Kakade
    Accepted to ICLR 2026
  1. LOTION: Smoothing the Optimization Landscape for Quantized Training
    Mujin Kwun, Depen Morwani, Chloe Huangyuan Su, Stephanie Gil, Nikhil Anand, Sham Kakade
    Under submission at ICML 2026
  1. Adam or Gauss-Newton? A Comparative Study In Terms of Basis Alignment and SGD Noise
    Bingbin Liu, Rachit Bansal, Depen Morwani, Nikhil Vyas, David Alvarez-Melis, Sham M. Kakade
    Under submission at ICML 2026
  1. Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants
    Depen Morwani, Nikhil Vyas, Hanlin Zhang, Sham Kakade
    Under submission at ICML 2026
  1. A Simplified Analysis of SGD for Linear Regression with Weight Averaging
    Alexandru Meterez, Depen Morwani, Costin-Andrei Oncescu, Jingfeng Wu, Cengiz Pehlevan, Sham Kakade
    Optimization for Machine Learning (OPT) Workshop, 2025

2024

  1. A New Perspective on Shampoo's Preconditioner
    Depen Morwani, Itai Shapira, Nikhil Vyas, Eran Malach, Sham Kakade, Lucas Janson
    Accepted to ICLR 2025
  1. Deconstructing What Makes a Good Optimizer for Language Models
    Rosie Zhao, Depen Morwani, David Brandfonbrener, Nikhil Vyas, Sham Kakade
    Accepted to ICLR 2025
  1. SOAP: Improving and Stabilizing Shampoo using Adam
    Nikhil Vyas, Depen Morwani, Rosie Zhao, Itai Shapira, David Brandfonbrener, Lucas Janson, Sham Kakade
    Accepted to ICLR 2025
  1. How Does Critical Batch Size Scale in Pre-training?
    Hanlin Zhang, Depen Morwani, Nikhil Vyas, Jingfeng Wu, Difan Zou, Udaya Ghai, Dean Foster, Sham Kakade
    Accepted to ICLR 2025

2023

  1. Feature emergence via margin maximization: case studies in algebraic tasks
    Depen Morwani, Benjamin L. Edelman, Costin-Andrei Oncescu, Rosie Zhao, Sham Kakade
    Accepted as Spotlight (Acceptance Rate 5%) to ICLR 2024
  1. Simplicity Bias in 1-Hidden Layer Neural Networks
    Depen Morwani, Jatin Batra, Prateek Jain, Praneeth Netrapalli
    Accepted to NeurIPS 2023
  1. Feature-Learning Networks Are Consistent Across Widths At Realistic Scales
    Nikhil Vyas, Alex Atanasov, Blake Bordelon, Depen Morwani, Sabarish Sainathan, Cengiz Pehlevan
    Accepted to NeurIPS 2023
  1. Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
    Nikhil Vyas, Depen Morwani, Rosie Zhao, Gal Kaplun, Sham Kakade, Boaz Barak
    Accepted as Spotlight (Acceptance Rate 3.5%) to ICML 2024

2022

  1. Inductive bias of gradient descent for weight normalized smooth homogeneous neural nets
    Depen Morwani, Harish G. Ramaswamy
    Accepted to ALT 2022