Robert Yi

Co-Founder & CTO @ Oxygen Intelligence · 2024 - Present

Open-source framework for agentic analytics

I’m a physicist by training, drawn to uncovering and working with the structures beneath complex behaviors. For a long time, that meant studying physical systems—how rivers carve predictable patterns from diffusive flows, how geometry emerges from pressure gradients.

More recently, I’ve been building startups and open-source software, and the company-building process has provided numerous points for reapplying this instinct: how to design robust engineering systems, how to help people work with data more effectively, how intellectual honesty (or the lack thereof) shapes outcomes more than execution theater.

Below, you’ll find a high-level overview of my work and educational history, as well as a few technical posts and papers that couldn’t find a home elsewhere. For other writing, see think.ryi.me.

Previously

Hyperquery · 2020 - 2024

Co-Founder & CTO · Next-gen data notebook

Khosla-backed, acquired by Deepnote

Airbnb · 2019 - 2020

Senior Data Scientist

Uplift modeling, CUPED innovations, product analytics · Created whale

Wayfair · 2017 - 2019

ML Technical Lead

User-level ad bidding · Created pylift

Education

MIT · 2011 - 2017

PhD in Geophysics · Thesis

Pattern formation · Machine learning · Complex analysis

Harvard · 2006 - 2010

A.B. in Physics, Honors

Selected Publications

Uplift model evaluation for randomized control trials · Preprint, 2020

Shapes of river networks · Proc. R. Soc. A, 2018

Pylift: A fast Python package for uplift modeling · Wayfair Tech Blog, 2018

Symmetric rearrangement of groundwater-fed streams · Proc. R. Soc. A, 2018

A free-boundary model of diffusive valley growth · Proc. R. Soc. A, 2017

Path selection in the growth of rivers · PNAS, 2015

Uplift model evaluation for randomized control trials

An overview of methods to evaluate the effectiveness of models in finding heterogeneous treatment effects in randomized control trials (“uplift models”), introducing two novel evaluation curves: the adjusted Qini curve and the efficiency curve. Read the full paper (PDF) Abstract Uplift models seek to estimate individual treatment effects, helping practitioners answer questions like “who should we target with our treatment” rather than simply “what is the individual treatment effect”. This paper provides a comprehensive overview of evaluation methods for such models, with particular focus on:...

An introduction to unbiased [and doubly robust] estimators

Often, data collection cannot be completely random – e.g. in clinical trials, where it would be unethical to randomly treat people with medicine, or in online surveys, where response cannot be guaranteed. In such cases, data can be biased, so any inferences drawn or machine learning models built from this data will not generalize well to the overall population. This is where unbiased estimation can come in, in which small adjustments are effectively made to the dataset to make it more representative of a random sample....

Emergent geometries of groundwater-fed rivers

My PhD thesis on emergent geometric complexity in natural systems, using river networks as a rich example of how simple constituent interactions produce novel structures and statistical properties across multiple scales. Download PDF (65 MB) | MIT DSpace Abstract This thesis explores emergence—how novel structures arise from collective interactions between constituent entities—through the lens of river network geometry. River networks exemplify emergent complexity: simple physical processes (erosion, diffusion, pressure gradients) interact to produce geometric patterns and power-law statistics that appear fundamentally different from their microscopic origins....