Current humanoids have little chance of success, says famed roboticist Rodney Brooks
Today's training methods are woefully inadequate and the massive sensory-data gap won't be closed any time soon, says the co-founder of iRobot.

Leading humanoid robotics companies face a fundamental challenge that billions in venture capital may not solve, according to Rodney Brooks, the MIT roboticist who co-founded iRobot. His latest analysis targets the video-based training approaches championed by firms like Figure and Tesla, arguing they overlook a critical component of human dexterity: touch.
Brooks contends that companies pursuing humanoid robots are making a costly miscalculation. Both Figure and Tesla have shifted toward vision-only training methods, recording workers performing tasks with multi-camera rigs and using this footage to teach robots to mimic human movements. Tesla explicitly abandoned motion capture suits and teleoperation in favor of this approach, while Figure promotes learning “directly from everyday human video.”
The core of Brooks’ argument rests on human physiology. He points to research showing human hands contain approximately 17,000 mechanoreceptors and 15 distinct families of touch-sensing neurons. These sensors detect pressure, vibration, texture, and minute force variations that enable dexterous manipulation. Laboratory experiments demonstrate this dependency: when fingertips are numbed, simple tasks like lighting a match become four times slower and significantly more difficult.
Current humanoid training methods capture none of this tactile information. According to Brooks, video-based systems lack force feedback at the wrists, provide limited finger control precision, and operate without any sense of touch. This data gap may prove insurmountable for achieving human-level dexterity.
Brooks challenges the assumption that brute-force learning will succeed where traditional robotics has struggled. He argues that celebrated artificial intelligence breakthroughs in speech recognition, image processing, and large language models were not truly “end-to-end.” Instead, each relied on sophisticated, domain-specific preprocessing that mimicked human sensory systems.
Speech-to-text systems use filtering and frequency analysis originally developed for telephone networks. Image recognition employs convolutional neural networks that replicate visual cortex structures. Language models depend on tokenization and embedding techniques rooted in human linguistic understanding.
The implications extend beyond technical challenges. Brooks suggests the industry may be pursuing an expensive dead end, with “hundreds of millions, or perhaps many billions of dollars” at risk. His timeline estimates place practical humanoid deployment more than a decade away, potentially stranding current investment approaches.
The roboticist’s credibility in this debate is substantial. His previous ventures produced successful commercial robots including Roomba and industrial models deployed in factories worldwide. His criticism carries weight partly because he has navigated both academic research and commercial robot manufacturing.
Brooks predicts that viable humanoid robots will eventually emerge, but with forms bearing little resemblance to current designs or human anatomy. Companies may need to abandon anthropomorphic designs and video-training methods to achieve practical results, suggesting that today’s humanoid robot investments may prove as obsolete as their underlying assumptions.
The stakes involve more than venture capital returns. If Brooks proves correct, the timeline for humanoid robots entering workplaces and homes extends considerably, affecting labor market assumptions and automation investment strategies across industries.