Teaching robot policies without new demonstrations: interview with Jiahui Zhang and Jesse Zhang - Robohub

Source: robohub
Published: 12/4/2025
To read the full content, please visit the original article.
Read original articleThe article discusses the ReWiND framework introduced by Jiahui Zhang, Jesse Zhang, and colleagues in their CoRL 2025 paper, which enables robots to learn manipulation policies for novel, language-specified tasks without requiring new task-specific demonstrations. ReWiND operates in three stages: first, it learns a dense reward function from a small set of demonstrations in the deployment environment by predicting per-frame progress toward task completion. A novel video rewind augmentation technique is used to synthetically generate sequences simulating both progress and failure, improving the reward model’s accuracy and generalization. Second, the framework pre-trains a policy offline using these dense rewards relabeled on the demonstration data. Finally, the pre-trained policy is fine-tuned online in the deployment environment on unseen tasks, using the frozen reward function to provide feedback without additional demonstrations.
The researchers evaluated ReWiND in both simulated (MetaWorld) and real-world (Koch) environments, focusing on the reward model’s generalization and the policy
Tags
roboticsrobot-learningreinforcement-learninglanguage-guided-roboticsrobot-manipulationreward-function-learningpolicy-adaptation