Publications
Publications in reversed chronological order. For complete list, see Google Scholar.
2025
-
The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-NewtonAccepted to ICLR 2026
-
Seesaw: Accelerating Training by Balancing Learning Rate and Batch Size SchedulingAccepted to ICLR 2026
-
LOTION: Smoothing the Optimization Landscape for Quantized TrainingUnder submission at ICML 2026
-
Adam or Gauss-Newton? A Comparative Study In Terms of Basis Alignment and SGD NoiseUnder submission at ICML 2026
-
Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD VariantsUnder submission at ICML 2026
-
A Simplified Analysis of SGD for Linear Regression with Weight AveragingOptimization for Machine Learning (OPT) Workshop, 2025
2024
2023
-
Feature emergence via margin maximization: case studies in algebraic tasksAccepted as Spotlight (Acceptance Rate 5%) to ICLR 2024
-
Feature-Learning Networks Are Consistent Across Widths At Realistic ScalesAccepted to NeurIPS 2023
-
Beyond Implicit Bias: The Insignificance of SGD Noise in Online LearningAccepted as Spotlight (Acceptance Rate 3.5%) to ICML 2024
2022
-
Inductive bias of gradient descent for weight normalized smooth homogeneous neural netsAccepted to ALT 2022