Skip to the content.

Abstract

HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks.

Open-Vocabulary Mobile Manipulation (OVMM) is a core challenge for robotics research because it involves bringing together four key capabilities: perception, language understanding, navigation, and manipulation, all of which will be necessary for robots to be useful assistants in human environments. OVMM is a foundational challenge for generally useful robots precisely because it requires tackling and integrating all of these components. To drive research in this area, we introduce the HomeRobot OVMM challenge, where an agent navigates household environments to grasp novel objects and place them on target receptacles. HomeRobot has two components: a simulation component, which uses a large and diverse curated object set in new, high-quality multi-room home environments; and a real-world component, where we provide a software stack for the low-cost Hello Robot Stretch to encourage duplication of real-world experiments across labs. We implement both a reinforcement learning and a heuristic (model-based) baseline and show evidence of sim-to-real transfer.

Real-world success cases

"Move the stuffed animal from the chair to the sofa."

"Move the elephant from the chair to the table."


Sim success cases

"Move the multiport hub from the stool to the table."

"Move the toy from the table to the stool."

"Move the box from the stand to the chair."


Analysis: Comparing baselines

– Perception: Ground-truth vs. DETIC

Instruction: “Move the cell phone from the chest of drawers to the counter panel.”

Ground-truth segmentation

DETIC segmentation: fails to detect cell phone on chest of drawers.

Conclusion: As expected, ccess to GT semantics improves results.


– Finding object: RL vs. Heuristic policy

Instruction: “Move the teapot from the cabinet to the chair”

RL FindObject

Heuristic FindObject: stops much farther than RL counterpart, causing the Gaze skill that follows to go wayward.

Conclusion: RL seems to be doing better at finding object.