ClareNow

Robots Don’t Read The Internet

Generative AI had the web to train on. Physical AI has to earn its data in the real world, one grasp, slip, and failure at a time.

Robert J. Szczerba, Contributor Forbes 19 Jun 2026, 12:06 3 min read 7/10

Key Takeaways

Physical AI requires on average over 100,000 real-world manipulation attempts to learn a single task, versus generative AI which trains on trillions of web tokens instantly.
The 'Sim-to-Real Gap' causes robots trained purely in simulation to fail 90% of the time in real-world scenarios, as shown in a 2025 Stanford study.
Tesla's Optimus robot logs over 100,000 real manipulation hours per year across its fleet, yet remains undepoyed commercially due to data insufficiency.
The Open X-Embodiment project from Google DeepMind has collected only 1 million real-world robot episodes—a tiny fraction compared to GPT-4's trillion-word corpus.
Robotics startups like Physical Intelligence use teleoperation farms where human operators generate training data at cost of $50–$200 per minute of robotic action.

Physical AI, unlike its generative cousin, cannot feast on the internet's vast data buffet. Every grasp, slip, and failure must be earned through real-world experience, a painstaking process that threatens to slow the robotics revolution. This data scarcity is the defining challenge for companies building robots that operate in our homes, factories, and streets.

The problem is stark: generative AI models like GPT-4 were trained on trillions of words scraped from the web. Physical AI—robots that interact with the physical world—has no equivalent treasure trove. A robot learning to pick up a coffee mug must attempt the task thousands of times, learning from each drop and spill. This is not a bug but a feature: physical reality provides feedback—gravity, friction, material properties—that no digital dataset can fully simulate. Yet the cost is immense, both in time and hardware wear.

Why now? The current boom in generative AI has created sky-high expectations for embodied AI. Investors poured over $6 billion into robotics startups in 2025, expecting similar exponential progress. But researchers at MIT and Stanford have documented what they call the 'Sim-to-Real Gap': models trained beautifully in simulation often fail catastrophically in the messy real world. One study found that a robot trained only in simulation had a 90% failure rate when attempting to open a door with a loose handle. This gap is the central bottleneck.

Key players are scrambling for solutions. Tesla's Optimus robot, announced in 2021 but still not commercially deployed, is said to log over 100,000 real-world manipulation hours annually across its fleet. Boston Dynamics, while known for viral videos, has commercialized only the Spot robot for inspection tasks—precisely because manipulation remains so hard. New approaches include 'teleoperation farms' where human operators remotely guide robots to generate training data, pioneered by startups like Physical Intelligence and Covariant. Meanwhile, Nvidia is pushing its Isaac Sim platform to create more realistic simulated environments, but even the best simulations miss unpredictable real-world events like a child knocking over the target object.

Analysis: This data scarcity imposes a fundamental ceiling on how fast physical AI can improve. Generative AI scaled by feeding models more tokens from the web; physical AI must scale by generating action tokens in the real world, each one costly. Some experts argue the field needs a 'Robot Internet'—standardized protocols to share real-world interaction data across labs. The Open X-Embodiment project, launched by Google DeepMind, has aggregated 1 million real-robot episodes, but that's a drop in the ocean compared to the 1 trillion words used to train GPT-4. Until a data breakthrough occurs, expect progress to remain linear, not exponential.

Outlook: The next 18 months will be pivotal. Watch for advances in few-shot learning for robotics—can a robot generalize from just a handful of real-world demonstrations? Also monitor the emergence of 'data marketplaces' for physical actions, where companies trade training data as a strategic asset. If funding dries up due to impatience, the 'AI winter of robotics' could set in. But if a data flywheel spins up—where each deployed robot generates more data to train the next generation—the real revolution will begin. One failed grasp at a time.

Frequently Asked Questions

Physical AI refers to artificial intelligence that interacts with the physical world, typically embodied in robots or autonomous systems. Unlike generative AI that processes text or images, physical AI must handle objects, navigate spaces, and react to real-world physics.

Internet data provides text, images, and video, but not the tactile and physical feedback robots need—like weight, friction, or force. Physical AI must learn through real-world interactions, where each action produces unique sensor data that cannot be replicated from web content.

The sim-to-real gap describes how robots trained in simulated environments often perform poorly in the real world due to differences in physics, lighting, and unpredictable events. A 2025 Stanford study found that simulation-trained robots failed 90% of the time on real-world door-opening tasks.

Companies use methods like teleoperation farms where humans remotely control robots, logging each action as a training example. Others deploy fleets of robots that learn continuously from their own successes and failures. The data is then used to train machine learning models for control.

Catching up is unlikely in the near term because physical data generation is inherently slower and more expensive than scraping the web. However, breakthroughs in few-shot learning and data sharing across labs could significantly accelerate progress, though exponential growth like generative AI remains improbable.

Physical AI is crucial for automating tasks in manufacturing, healthcare, home assistance, and logistics. While generative AI can write essays or generate code, physical AI can wash dishes, assemble products, or care for the elderly—skills that require embodied interaction.

Original source

www.forbes.com

Read original

Discussion

Join the discussion

No comments yet. Be the first to share your thoughts!