Make a Donut: Language-Guided Hierarchical EMD-Space Planning for Zero-shot Deformable Object Manipulation

Published in RA-L, 2025

Deformable object manipulation is a challenging area in robotics, often relying on demonstrations to learn task dynamics. However, obtaining suitable demonstrations for long-horizon tasks is difficult and can limit model generalization. We propose a demonstration-free hierarchical planning approach for complex long-horizon tasks without training. Using large language models (LLMs), we create a high-level, stage-by-stage plan for a task, specifying tools and generating Python code for intermediate subgoal point clouds. With these, we employ a closed-loop model predictive control strategy using Differentiable Physics with Point-to-Point correspondence (DiffPhysics-P2P) loss in the earth mover distance (EMD) space. Our method outperforms benchmarks in dough manipulation tasks and generalizes well to novel tasks without demonstrations, validated through real-world robotic experiments.

Recommended citation: You, Y., Shen, B., Deng, C., Geng, H., Wang, H., & Guibas, L. (2023). Make a Donut: Language-Guided Hierarchical EMD-Space Planning for Zero-shot Deformable Object Manipulation. arXiv preprint arXiv:2311.02787.

Share on

Twitter Facebook LinkedIn

Yang You (尤洋)

Share on