Login with HarvardKey to view all events.

Towards a ‘Chemistry of AI’: Unveiling the Structure of Training Data for more Scalable and Robust Machine Learning

This is a past event.

Thursday, October 9, 2025 2:30pm to 3:30pm

Image of Towards a ‘Chemistry of AI’: Unveiling the Structure of Training Data for more Scalable and Robust Machine Learning

Event Dates

Thursday, October 9, 2025 2:30pm to 3:30pm

Science and Engineering Complex (SEC), LL2.224
Add to calendar

Title: Towards a ‘Chemistry of AI’: Unveiling the Structure of Training Data for more Scalable and Robust Machine Learning

Speaker: David Alvarez-Melis, Assitant Professor, Harvard SEAS

Abstract: Recent advances in AI have underscored that data, rather than model size, is now the primary bottleneck in large-scale machine learning performance. Yet, despite this shift, systematic methods for dataset curation, augmentation, and optimization remain underdeveloped. In this talk, I will argue for the need for a “Chemistry of AI”—a paradigm that, like the emerging “Physics of AI,” embraces a principles-first, rigorous, empiricist approach but shifts the focus from models to data. This perspective treats datasets as structured, dynamic entities that can be transformed through optimization and seeks to characterize their fundamental properties, composition, and interactions. I will then highlight some of our recent work that takes initial steps toward establishing this framework, including principled methods for dataset synthesis and surprising recent findings in dataset distillation. Together, these findings highlight emerging principles in data-centric ML and suggest new levers for steering model behavior via thoughtful data design, composition, and representation.

Event Details