RIEM News LogoRIEM News

New Microsoft AI trains robots to know what to do and where to act

New Microsoft AI trains robots to know what to do and where to act
Source: interestingengineering
Author: @IntEngineering
Published: 3/26/2026

To read the full content, please visit the original article.

Read original article
Microsoft and a group of academic researchers have introduced GroundedPlanBench, a new benchmark designed to address a key challenge in robotics: enabling robots to simultaneously decide what actions to take and where to perform them. Traditional systems separate these tasks into two stages—first generating a natural language plan, then converting it into actions—which often leads to errors, especially in cluttered or ambiguous environments. GroundedPlanBench links each action directly to specific locations in images, grounding basic tasks like grasping or placing objects in spatial context. The benchmark includes over 1,000 tasks derived from real robot interactions, featuring both straightforward and open-ended instructions to reflect real-world ambiguity that often confuses robots. To improve robotic planning and execution, the team developed a training method called Video-to-Spatially Grounded Planning (V2GP), which learns from videos of robots performing tasks by detecting object interactions and tracking their positions. This approach generated more than 40,000 grounded plans ranging from simple to complex multi-step sequences.

Tags

roboticsartificial-intelligencerobot-planningspatial-reasoningmachine-learningrobot-interactionvision-language-models