Google DeepMind’s Gemini Robotics basics explained

News, views and interviews from the world of autonomous mobile robots.

Jul 02, 2025

In 2025, Google DeepMind unveiled Gemini Robotics, a suite of AI models designed to bring artificial intelligence out of the digital realm and into the physical world. Building on the Gemini 2.0 platform, Gemini Robotics equips robots with the ability to see, understand, and interact with their environments — enabling them to perform tasks with a level of dexterity and adaptability previously out of reach for machines.

At the heart of Gemini Robotics is a “vision-language-action” (VLA) system. Unlike earlier models that focused on one skill at a time, Gemini Robotics can process and respond to text, images, audio, and video, while also reasoning about physical spaces and taking action. This means a robot can understand a spoken request, visually identify objects, and manipulate them with precision — folding origami, packing lunch boxes, or even playing games like Tic-Tac-Toe.

The models excel at generalization, allowing robots to handle unfamiliar objects and situations without retraining, a leap from traditional robotics that required repetitive, task-specific programming.

Gemini Robotics comes in two main variants: the flagship Gemini Robotics model and Gemini Robotics-ER. The former integrates advanced multimodal reasoning with physical action, while the latter is optimized for “embodied reasoning,” helping robots make sense of their surroundings, predict object trajectories, and generate code to execute complex tasks.

A new “On-Device” version allows these capabilities to run directly on robotic hardware, ensuring fast, reliable performance even without cloud connectivity.

DeepMind’s approach is holistic, emphasizing safety and responsible deployment. The team has introduced the ASIMOV dataset to benchmark safety in real-world robotic actions and is working with partners like Apptronik, Agile Robots, and Boston Dynamics to integrate Gemini Robotics into diverse platforms, from bi-arm research robots to humanoid machines.

The implications are profound: Gemini Robotics sets a new bar for dexterity, generality, and interactivity in robotics. By enabling robots to follow natural language instructions and adapt to dynamic environments, Google DeepMind is laying the groundwork for AI-powered machines that could one day assist in homes, factories, and beyond—turning science fiction’s vision of helpful, adaptable robots into a tangible reality.