Research
I'm broadly interested in deep learning, generative models, and physical AI. Specifically, I'm interested in scaling deep learning with principled techniques that efficiently utilize data and compute.
|
|
Cosmos World Foundation Model Platform for Physical AI
NVIDIA,
arXiv, 2025
project page / arXiv / code / keynote / press: New York Times, Wall Street Journal, Fortune, TechCrunch, Forbes, Wired, BBC
Generative world foundation models for data-driven simulation of physical AI systems.
|
|
Training Video Foundation Models with NVIDIA NeMo
NVIDIA: Zeeshan Patel (Lead Contributor),
arXiv / technical blog / code
Open-source video foundation model training framework, providing accelerated video dataset curation,
multimodal dataloading, and parallelized video diffusion model training and inference.
|
|
Scaling Properties of Diffusion Models For Perceptual Tasks
Zeeshan Patel*,
Rahul Ravishankar*,
Jathushan Rajasegaran,
Jitendra Malik
CVPR 2025
project page / arXiv / code
Iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and segmentation under image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perception tasks.
|
|
Exploring Diffusion and Flow Matching Under Generator Matching
Zeeshan Patel*,
James DeLoye,
Lance Mathias
Preprint, 2024
arXiv
We explore diffusion and flow matching models under the theoretical framework of generator matching. Our analysis offers a fresh perspective on the relationships between these state-of-the-art generative modeling paradigms and how to build new generative Markov processes that benefit from both approaches.
|
|
SWAG: Storytelling With Action Guidance
Zeeshan Patel*,
Jonathan Pei*,
Karim El-Refai*,
Tianle Li
EMNLP, 2024
arXiv
We introduce Storytelling With Action Guidance (SWAG), a novel approach to storytelling with LLMs. Our approach reduces story writing to a search problem through a two-model feedback loop. SWAG can optimize open-sourced LLMs to substantially outperform previous end-to-end story generation techniques leveraging closed-source models.
|
|
Test-Time Training for Image Superresolution
Zeeshan Patel*,
Yossi Gandelsman
Preprint, 2023
paper / code
We present a self-supervised test-time training approach for fine-tuning image superresolution models to adapt to new test distributions on-the-fly.
|
|