Nvidia has launched Cosmos 3, an open world foundation model for physical AI that the company says can natively understand and generate text, images, video, ambient sound and actions while delivering high levels of physics-based accuracy.
According to Nvidia, Cosmos 3 is the first fully open “omnimodel” designed to accelerate the development of physical AI systems by reducing training and evaluation cycles from months to days. The model is intended to help robots, autonomous vehicles and other AI-powered systems better understand, simulate and predict real-world environments.
“The big bang of physical AI is just around the corner thanks to breakthroughs in multimodal reasoning language, vision and world models,” said Nvidia CEO Jensen Huang. “The Cosmos 3 family of open, frontier omnimodels gives developers a generational leap in ability to build robots, autonomous vehicles and vision AI that perceive, reason, plan and act in the physical world.”
READ: Nvidia-powered Windows PCs set for debut (May 31, 2026)
Nvidia said it trained Cosmos 3 on 20 trillion tokens of multimodal data, including nearly a billion images, 400 million real and synthetic videos, ambient audio, text and action data from humans and robots.
Ming-Yu Liu, VP of Nvidia’s Cosmos Lab, told Axios that the action data is what makes Cosmos different from a regular video generator, as it’s meant to model how machines move, not just how scenes look.
Liu also said that Cosmos is an open model, similar to its early Nemotron family, making it easier for hardware makers to customize Cosmos to their needs and ensure that future versions more closely align to the needs of the industry.
Nvidia is also launching the Nvidia Cosmos Coalition, a global collaboration between world model builders and AI developers, including Agile Robots, Black Forest Labs, Generalist, LTX, Runway and Skild AI. The company also says Cosmos can generate rare or dangerous scenarios — such as robot collisions or unusual road events — which are difficult, expensive or unsafe to capture repeatedly.
READ: Taiwan suspects Nvidia AI chips routed to China through Japan (May 27, 2026)
Two versions of the model are to be released immediately — a “super” model for tasks requiring high physics accuracy, such as training robots and autonomous vehicles, and a “nano” model that can generate results in fractions of a second. Nvidia also noted an “edge” model that can run locally is coming soon.
World models have of late been an area of focus for AI companies, as companies increasingly want to use AI for physical applications. “Ultimately what a world model wants to achieve is to help physical agents to become more generalizable,” Liu said. “To become more generalizable, you need to understand the world so you understand how it works, so you can make a plan.”
Nvidia, partnered with Microsoft, is also expected to unveil the first Windows computers powered by Nvidia-designed processors next week. The new devices are expected to be showcased at both the Computex technology exhibition in Taiwan and Microsoft Build developer conference in San Francisco.

