The prevalent paradigm in robot learning attempts
to generalize across environments, embodiments, and tasks with language prompts at
runtime. A fundamental tension limits this approach: language is often too abstract
to
guide the concrete physical understanding required for robust manipulation. In this
work, we introduce Contact-Anchored Policies (CAP), a new class of general
robotic behavior models, which replace language conditioning with points of physical contact in space. Simultaneously, we structure CAP
as a library of modular utility models rather than a monolithic generalist
policy. This factorization allows us to implement an efficient real-to-sim iteration cycle: we build
EgoGym, a lightweight simulation benchmark, to rapidly
identify failure modes and refine our models and datasets prior to real-world
deployment.
We show that by conditioning on contact and iterating via simulation,
CAP generalizes to novel environments and embodiments out of the box on three
fundamental manipulation skills while using only 23 hours of demonstration data, and
outperforms large, state-of-the-art VLAs in zero-shot evaluations by 56%. All model
checkpoints, codebase, hardware, simulation, and datasets will be open-sourced.