Vision + Language

Video Action Differencing

James Burgess, Xiaohan Wang, Yuhui Zhang, Anita Rau, Alejandro Lozano, Lisa Dunlap, Trevor Darrell, Serena Yeung-Levy
[ICLR 2025] Website | Paper | Code
TL;DR given two videos of an action, describe how they differ
VisionArena: 230K Real World User-VLM Conversations with Preference Labels

Christopher Chou*, Lisa Dunlap*, Koki Mashita, Krishna Mandal, Trevor Darrell, Ion Stoica, Joseph E. Gonzalez, Wei-Lin Chiang
[CVPR 2025] Paper | Code | Dataset
TL;DR It’s the data release for Chatbot Arena, a platform for crowdsourcing preference votes.
VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models

Lisa Dunlap, Krishna Mandal, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez
[ICLR 2025] Website | Paper | Code | Dataset
TL;DR We find qualitative properties (vibes) in LLMs and measure how well they can distinguish models and predict user preference
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

Tianle Li*, Wei-Lin Chiang*, Evan Frick, Lisa Dunlap, Tianhao Wu, Banghua Zhu, Joseph E. Gonzalez, Ion Stoica
Paper | Code | Blog | Dataset | Leaderboard
TL;DR Filter large, messy NLP datasets into a smaller set of high-quality prompts using LLMs
Describing Differences in Image Sets with Natural Language

Lisa Dunlap*, Yuhui Zhang*, Xiaohan Wang, R. Zhong, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez, Serena Yeung-Levy
[CVPR 2024 (oral)] Paper | Code | Website
TL;DR Set Difference Captioning - describing differences in two large sets of images with language - has many impactful ML & data science applications
See, Say, and Segment: Teaching LMMs to Overcome False Premises

Tsung-Han Wu, Giscard Biamby, David Chan, Lisa Dunlap, Ritwik Gupta, Xudong Wang, Joseph E. Gonzalez, Trevor Darrell
[CVPR 2024] Paper | Website | Code
TL;DR we train segmentation-VQA models to see if an object is present, say if the object isnt present and suggest alternatives, and segment said object
Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation (ALIA)

L. Dunlap, A. Umino, P. Zhang, J. Yang, J. E. Gonzalez, T. Darrell
[NeurIPS 2023] Paper Code Website
TL;DR V&L models can summarize high-level spurious features in your data with language which can be used to augment your data with diffusion models
Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence

G. Luo, L. Dunlap, D. Huk Park, A. Holynski, T. Darrell
[NeurIPS 2023] Paper Website Code
TL;DR aggregating diffusion features over different layers and timesteps leads to fantastic features for semantic correspondence
Using Language to Extend to Unseen Domains (LADS)

L. Dunlap, C. Mohri, D. Guillory, H. Zhang, T. Darrell, J. E. Gonzalez, A. Raghunathan, A. Rohrbach
[ICLR 2023 (spotlight)] Website Paper Code Blog Slides
TL;DR it’s UDA but instead of unlabeled target data its language and you want to maximize target accuracy while maintaining source accuracy
On Guiding Attention with Language Specification (GALS)

S. Petryk*, L. Dunlap*, K. Nasseri, J. E. Gonzalez, T. Darrell, A. Rohrbach.
[CVPR 2022] Paper Code
TL;DR saliency of V+L models can be used to guide CNNs training on biased data using a language description of what to focus on
NBDT: Neural-Backed Decision Trees

A. Wan, L. Dunlap*, D. Ho*, J. Yin, S. Lee, H. Jin, S. Petryk, S. A. Bargal, and J. E. Gonzalez.
[ICLR 2021] Paper Website Blog Talk
TL;DR training a CNN to have the class hierarchy of a decision tree increases accuracy and interpretability
Deep Mixture of Experts Via Shallow Embedding

X. Wang, F. Yu, L. Dunlap, R. Wang, Y. A. Ma, A. Mirhoseini, T. Darrell, and J. E. Gonzalez.
[UAI 2019] Paper
TL;DR lots of MoE’s + sparse gating network = better accuracy and less computation

ML Systems

Improve Model Inference Cost with Image Gridding

S. Krishnaswamy, L. Dunlap, L. Chen, M. Zaharia, J. Zou, J. Gonzalez
[ICML 2023 DMLR workshop] Paper
TL;DR reduce vision model API costs by gridding your images together
Hyperparameter Tuning with Elastic Resources

L. Dunlap, K. Kandasamy, U. Mishra, R. Liaw, J. Gonzalez, I. Stoica, M. Jordan.
[SOCC 2021] Paper Talk
TL;DR given a deadline and a cloud budget, produce an optimal HP tuning experiment
RubberBand: Cloud Based Hyperparameter Tuning

R. Liaw*, U. Mishra*, L. Dunlap, R. Bhardwaj, A. Tumanov, J. Gonzalez, I. Stoica.
[EuroSys 2021] Paper Talk
TL;DR given a HP tuning experiment and time deadline, minimize cost on the cloud
Hypersched: Dynamic resource allocation for model development on a deadline

R. Liaw, R. Bhardwaj, L. Dunlap, A. Tumanov, J. E. Gonzalez, I. Stoica
[SoCC 2019] Paper
TL;DR when HP tuning on a time deadline, dynamically allocate resrouces to jobs

Misc

[Joke] MICKIE: The Magically Interpretable Cloud Komputing Inference Engine

Everyone on the bus to the 2022 Sky Disneyland Retreat (but mostly Conor Power)
[Under arxiv ethics review] Paper Code
TL;DR hear me out - large language models WILL replace the cloud
Machine Log Parsing with Named Entity Recognition 

L. Dunlap, A. Starosta, K. Curtis, Z. Wang, C. Sarkar, R. Sriharsha.
[Nvidia GTC 2021] Blog.
TL;DR NER models work surprisingly well for log parsing
Habitat-dependent search behavior in the Colorado Checkered Whiptail (Aspidoscelis neotesselata)

K. Utsumi, C. Kusaks, R. Pedersen, C. Staley, L. Dunlap, S. G. Smith, M. A. Eifler, D. A. Eifler.
[Western North America Naturalist 2019] Paper
TL;DR whiptails behave differently in shrub grassland VS pine-juniper woodland

Video Action Differencing

VisionArena: 230K Real World User-VLM Conversations with Preference Labels

VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models

From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

Describing Differences in Image Sets with Natural Language

See, Say, and Segment: Teaching LMMs to Overcome False Premises

Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation (ALIA)

Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence

Using Language to Extend to Unseen Domains (LADS)

On Guiding Attention with Language Specification (GALS)

NBDT: Neural-Backed Decision Trees

Deep Mixture of Experts Via Shallow Embedding

Improve Model Inference Cost with Image Gridding

Hyperparameter Tuning with Elastic Resources

RubberBand: Cloud Based Hyperparameter Tuning

Hypersched: Dynamic resource allocation for model development on a deadline

[Joke] MICKIE: The Magically Interpretable Cloud Komputing Inference Engine

Machine Log Parsing with Named Entity Recognition

Habitat-dependent search behavior in the Colorado Checkered Whiptail (Aspidoscelis neotesselata)

Machine Log Parsing with Named Entity Recognition