About Me

I am currently a Kempner Research Fellow at the Kempner Institute at Harvard University. In Fall 2026, I will start as an Assistant Professor at MIT with a shared appointment between Mathematics and EECS[AI+D].

I received my Ph.D. in Applied and Computational Mathematics at Princeton University under the supervision of Jason D. Lee, and my B.S. in Mathematics at Duke University where I was fortunate to work with Cynthia Rudin and Hau-Tieng Wu.

Research Interests

My research is focused on the mathematical foundations of deep learning. Some fun directions I’ve worked on are:

Deep Learning Optimization Dynamics

My goal is to develop a predictive, and ultimately prescriptive, theory for deep learning optimization. This requires grappling with settings not captured by classical optimization theory. For example, large-batch training typically occurs in a chaotic regime called the Edge of Stability (pictured). I’ve studied how different optimizers navigate the Edge of Stability regime in order to provide simple explanations for their dynamics and behavior. This line of work is summarized in this blogpost and in the papers Self-Stabilization and Central Flows.

You can also click this link for a fun visualization of limit cycles and chaos in Adam.

Representation Learning in Simple Models

The miracle of deep learning is that neural networks automatically extract meaningful representations from raw data during the optimization process. To gain insights into this process, I’ve studied the optimization dynamics of simple models trained on synthetic data to ask: What representations are learned? How many samples does the network need to learn them? What signals in the gradient help guide optimization towards them? I’ve worked on these questions in both feed-forward neural networks (MLPs) [1][2][3] and Transformers [4][5].

Computational-to-Statistical Gaps

Many high-dimensional learning problems exhibit a conjectured gap between the minimum number of samples needed information-theoretically to solve the problem, and the number of samples needed by polynomial time algorithms. This implies a fundamental tradeoff between runtime and sample complexity. I’ve studied this tradeoff in Gaussian single-index [3][6] and multi-index [7] models to identify structures that can make learning problems hard or easy.

Recruiting

I am actively looking for students starting in Fall 2026. If you are interested in working with me, please apply to either the Mathematics or EECS departments at MIT and list my name in your application.

Awards

Jane Street Graduate Research Fellowship (2024–2025)
NSF Graduate Research Fellowship (2021–2024)
Julia Dale Award, Duke University (2020)
Putnam Honorable Mention (2019)
Angier B. Duke Scholarship, Duke University (2016)

About Me

Research Interests

Deep Learning Optimization Dynamics

Representation Learning in Simple Models

Computational-to-Statistical Gaps

Recruiting

Selected Publications

The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models

Learning Compositional Functions with Transformers from Easy-to-Hard Data

Understanding Optimization in Deep Learning with Central Flows

Computational-Statistical Gaps in Gaussian Single-Index Models

How Transformers Learn Causal Structure with Gradient Descent

Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models

Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks

Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability

Neural Networks can Learn Representations with Gradient Descent

Label Noise SGD Provably Prefers Flat Global Minimizers

Awards