An overview of methods to evaluate the effectiveness of models in finding heterogeneous treatment effects in randomized control trials (“uplift models”), introducing two novel evaluation curves: the adjusted Qini curve and the efficiency curve.
Abstract
Uplift models seek to estimate individual treatment effects, helping practitioners answer questions like “who should we target with our treatment” rather than simply “what is the individual treatment effect”. This paper provides a comprehensive overview of evaluation methods for such models, with particular focus on:
- Qini-style curves and their variants (cumulative gain chart, adjusted Qini curve)
- Uplift and cumulative uplift curves
- The efficiency curve (novel contribution) - particularly useful when uplift models generate bids in auction scenarios
- Expected response curves for A/B test comparison
We also discuss the transformed outcome method for measuring prediction error despite the fundamental problem of causal inference.
Key Contributions
- Adjusted Qini Curve: A new normalization that addresses treatment/control imbalances while avoiding quirks of existing methods
- Efficiency Curve: A novel evaluation approach for bidding scenarios where accurate estimation matters as much as ranking
This work was conducted at Wayfair and published as a preprint in July 2020.
Co-authored with William T. Frost