Stay tuned! To be filled with more content soon.
Teaser Image

Teaching robots new skills quickly and conveniently is crucial for the broader adoption of robotic systems. In this work, we address the problem of one-shot imitation from a single human demonstration, given by an RGB-D video recording through a two-stage process. In the first stage which is offline, we extract the trajectory of the demonstration. This entails segmenting manipulated objects and determining their relative motion in relation to secondary objects such as containers. Subsequently, in the live online trajectory generation stage, we first re-detect all objects, then we warp the demonstration trajectory to the current scene, and finally, we trace the trajectory with the robot. To complete these steps, our method makes leverages several ancillary models, including those for segmentation, relative object pose estimation, and grasp prediction. We systematically evaluate different combinations of correspondence and re-detection methods to validate our design decision across a diverse range of tasks. Specifically, we collect demonstrations of ten different tasks including pick-and-place tasks as well as articulated object manipulation. Finally, we perform extensive evaluations on a real robot system to demonstrate the effectiveness and utility of our approach in real-world scenarios.

1-min Overview Video

Code

This work is released under under the GPLv3 license. For any commercial purpose, please contact the authors. A software implementation of this project can soon be found on GitHub.

Publications

If you find our work useful, please consider citing our paper:

Nick Heppert, Max Argus, Tim Welschehold, Thomas Brox, Abhinav Valada
DITTO: Demonstration Imitation by Trajectory Transformation
Under review, 2024.

(PDF) (BibTeX)

Authors

Nick Heppert

Nick Heppert

University of Freiburg

Max Argus

Max Argus

University of Freiburg

Tim Welschehold

Tim Welschehold

University of Freiburg

Thomas Brox

Thomas Brox

University of Freiburg

Abhinav Valada

Abhinav Valada

University of Freiburg

Acknowledgment

This work was funded by the Carl Zeiss Foundation with the ReScaLe project and the German Research Foundation (DFG): 417962828, 401269959.