World models & simulation @ Waymo · NeurIPS 2024 first-author · ex-Google Brain

Miles Hutson

Currently building a production-focused world model at Waymo, expanding it to represent high-dimensional outputs while keeping inference costs down. Previously model lead on DermAssist at Google Health. First-author at NeurIPS 2024.

Get in touch GitHub LinkedIn Scholar

Experience

Where I've worked

Sep 2022 – present
Waymo

Senior Software Engineer
Now: Production-focused world model

Expanding the world model to represent high-dimensional outputs while keeping inference costs down.
- — Rearchitected the transformer from a behavior-prediction model to a world-model architecture — and cut overall param count by folding specialized modules into the decoder.
- — Implemented a VQ-VAE that enables high-dimensional outputs and replaces bespoke, param-heavy per-output vocabularies with one compressed output space.
- — Serve as a TL across ~5 ML engineers on the model.
Prev: ML for Road Understanding

Improved the perception model's ability to understand the semantics of construction zones.
- — Foundational model loss & architecture improvements.
- — Evaluation methods that raised the quality of the deployed model.
Prev: ML for Behavior Prediction

Predicted the actions of cars, cyclists, and pedestrians so the car could safely share the road.
- — Designed and ran ablations to reduce inputs and model complexity.
Feb 2017 – Sep 2022
Google

Senior Software Engineer
DermAssist · Google Health Dermatology

Model lead. Identified the most promising research and shepherded it into the commercial product.
- — Trained the majority of models in the production classification ensemble; led ensemble distillation to cut resource use.
- — Built continual-model-update infrastructure and new performance metrics for differential-diagnosis models.
- — Also frontend TL for the CE-Mark-approved product (demoed at Google I/O); shipped the on-device TensorFlow.js image-quality checks.
- — Onboarded new team members.
Medical labeling infrastructure · Google Health
- — Labeling-data analysis library (labeler-performance and ground-truth estimation).
- — Full IDE for developers to author labeling tasks; clinical decision support system for healthcare professionals.
- — Co-author on Quality Control Challenges in Crowdsourcing Medical Labeling.
Applied-ML collaborations across Google
- — CNN for endotracheal and nasogastric tube placement detection.
- — T5-based tool to help job applicants formulate interview responses.
Earlier
- — Google Brain.
- — Ranking team within Google Cloud.
Jan – Dec 2016
University of Texas at Austin

Undergraduate Research Assistant
- — MmmTurkey framework for Mechanical Turk data collection (HCOMP publication). The framework was later picked up by the same lab for a HCOMP 2018 follow-up.
May – Aug 2016
Fitbit

Software Engineering Intern
- — Sleep and wellness algorithms. Built tooling to compare research models against production behavior on the same inputs; contributed to the work behind Fitbit Sleep Stages.
Aug – Dec 2015
Texas Tribune

Digital Media Intern
- — Data visualization for news stories.
- — Internal tool that let journalists explore the Tribune's structured datasets.
May – Aug 2015
Blackbaud

Software Engineering Intern
- — Debugged and shipped features on a platform for non-profit fundraising.
2012 – 2015
The Daily Texan

Digital Projects Lead → Senior Staff → Reporter
- — Led a 5-person interactives team. D3, Drupal, Python.

Education

School

2021 – 2024
Stanford University

M.S. Computer Science

Part-time, 4.0 GPA. AI/ML track. Independent project in model-based RL became a NeurIPS 2024 first-author paper.
▸ Show coursework — tap a card to see its project
Independent Project
project ↻

CS399

Model-based RL research that became "Policy-Shaped Prediction" — NeurIPS 2024 first-author paper.

arXiv →
Deep Multi-Task and Meta Learning
project ↻

CS330

Bird-song classifier applied to a novel dataset (with Andreas Paepcke).

Video →
Trustworthy Machine Learning
project ↻

CS329T

Black-box attack on text-summarization models that succeeds with less data than prior methods.
Deep Generative Models
project ↻

CS236

World Models derivative: an agent that learns to play a video game while training an autoencoder that prioritizes encoding what matters for control.

Video →
GitHub →
Paper →
Natural Language Processing with Deep Learning
project ↻

CS224N

Generating fake news articles from article summaries with GRU and DCNN models.

Paper →
Artificial Intelligence: Principles and Techniques
project ↻

CS221

Assigning left/right bias labels to news articles using NLP and explainability techniques.
Decision Making Under Uncertainty
project ↻

AA228

Track simulator for a self-driving model car using particle-filter navigation.

Paper →
Principles of Robot Autonomy I
project ↻

AA274A

Robot that simultaneously localized + mapped its environment and the objects in it, then executed a retrieval mission.

Principles of Robot Autonomy II

Interactive and Embodied Learning

Advanced Topics in Networking
2012 – 2016

University of Texas at Austin

B.S. Computer Science

3.87 GPA. Transferred from Journalism junior year. Phi Beta Kappa, Temple Scholar, Liberal Arts Honors, Unrestricted Endowed Presidential Scholarship.

Selected work

Things I've built

Production-focused world model — Waymo

Waymo · 2024 – present · Senior SWE

Current focus: expanding a production-focused world model to represent high-dimensional outputs while keeping inference costs down.

Policy-Shaped Prediction

NeurIPS 2024 · 2024 · First author

Reconstruction-based world models (DreamerV3, DreamerPro) waste capacity modeling pixel detail that's irrelevant to the task. We use a pretrained segmentation model, a task-aware reconstruction loss, and adversarial learning to focus the world model on what matters for control — recovering performance under intricate, predictable, but useless distractors.

DermAssist

Google Health · 2019 – 2022 · Model lead · Frontend TL

Consumer dermatology tool — computer vision to suggest possible matches for skin, hair, and nail conditions. CE-Mark approved, demoed at Google I/O.

As model lead I trained the majority of the production classification ensemble, designed the differential-diagnosis metric, and led the ensemble distillation that shrank the model's footprint.

As frontend TL I shipped the on-device TensorFlow.js image-quality checks. I also built the continual-update pipeline and the Post-Market Monitoring system that tracks live model performance in the wild.

Drone depth from a single camera

2021 · Monocular depth · VR

Monocular depth prediction trained on drone footage, then re-projected into a 3D point cloud you could walk through in VR. The fun part: depth-from-motion gives you most of the signal without needing stereo rigs or LiDAR.

Medium write-up →

Transcribing screenshots of Reddit posts

2020 · OCR · Transformers

An OCR + Transformer baseline for extracting text from Reddit posts that get shared around as screenshots, plus the dataset I built and trained it on.

Catbot

2019 · YOLO · Pi · OpenCV

YOLO-based pursuit robot, originally designed to chase a cat. Reprogrammed mid-demo to chase water bottles for safety reasons.

Hardware

details ↻

Hardware

Raspberry Pi 3
Raspberry Pi Camera
Adafruit motor controller
Ultrasonic rangefinder
2× OSEPP tank platform kits

DIY Robocar

2019 · Particle filter · Lane detection

Autonomous RC car for DIY Robocar races. Iterated through two approaches — a simulator-trained particle filter, then a perspective-transform + hue-based lane detector.

Approach 1 — particle filter

details ↻

Approach 1 — particle filter

Built a track simulator and trained particle-filter localization with online path planning against it. Worked well in sim; transferred poorly to the physical track.

Approach 2 — lane detection

details ↻

Approach 2 — lane detection

Simpler and more robust: perspective transform from the onboard camera, hue-based segmentation of the painted lane lines, and a steering controller driven by the detected lane geometry.