Teaching robot policies without new demonstrations: interview with Jiahui Zhang and Jesse Zhang - Robohub

Source: robohub
Published: 12/4/2025
To read the full content, please visit the original article.
Read original articleThe article discusses the ReWiND framework introduced by Jiahui Zhang, Jesse Zhang, and colleagues in their CoRL 2025 paper, which enables robots to learn manipulation policies for new language-specified tasks without requiring new demonstrations. ReWiND operates in three phases: first, it learns a dense reward function from a small set of demonstrations in the deployment environment by predicting per-frame progress toward task completion. A novel video rewind augmentation technique synthetically generates sequences simulating progress and failure to improve reward model accuracy and generalization. Second, the reward function is used to relabel demonstration data with dense rewards, enabling offline reinforcement learning to pre-train a policy. Finally, the pre-trained policy is fine-tuned online on unseen tasks using the frozen reward function as feedback, allowing continual adaptation without additional demonstrations.
The researchers validated ReWiND through experiments in both the MetaWorld simulation and a real-world robotic setup (Koch). They focused on the reward model’s ability to generalize to unseen tasks
Tags
roboticsrobot-learningreinforcement-learninglanguage-guided-roboticsrobot-manipulationreward-function-learningpolicy-adaptation