jueves, 14 de agosto de 2025

AI2s MolmoAct A 3D Robotics AI Model Challenging Nvidia and Google

The Allen Institute for AI (AI2) is stepping into the physical AI arena with its new open-source model, MolmoAct 7B. This model is designed to enable robots to reason in 3D space, posing a challenge to existing models from Nvidia and Google. MolmoAct, which is based on AI2's Molmo, can 'think' in three dimensions and is being released alongside its training data under an Apache 2.0 license (for the model) and CC BY-4.0 license (for the datasets).

AI2s MolmoAct A 3D Robotics AI Model Challenging Nvidia and Google

MolmoAct is classified as an Action Reasoning Model, which means it allows foundation models to reason about actions within a physical, 3D environment. According to AI2, MolmoAct's ability to reason in 3D space sets it apart from traditional vision-language-action (VLA) models, which typically lack this spatial reasoning capability. This enhanced capability allows robots to better understand and interact with their surroundings, leading to improved decision-making.

The model functions by outputting 'spatially grounded perception tokens,' which are pre-trained tokens extracted using a vector-quantized variational autoencoder. These tokens enable MolmoAct to gain a spatial understanding and encode geometric structures, allowing it to estimate distances between objects. Following this, the model predicts a sequence of 'image-space' waypoints and then outputs specific actions. AI2's research indicates that MolmoAct can adapt to different robot embodiments with minimal fine-tuning.

Benchmarking tests conducted by AI2 demonstrated that MolmoAct 7B achieved a task success rate of 72.1%, outperforming models from Google, Microsoft, and Nvidia. Experts in the field, such as Alan Fern from Oregon State University, view AI2's research as a natural progression in enhancing VLMs for robotics and physical reasoning. Daniel Maturana, co-founder of Gather AI, highlighted the open nature of the data as a significant benefit, providing a strong foundation for further development and fine-tuning.

The development of MolmoAct reflects the increasing interest in physical AI and the quest to create more intelligent and spatially aware robots. LLM-based methods, like those used in MolmoAct, are making it easier for robots to determine possible actions based on the objects they are interacting with, moving closer to achieving general physical intelligence.

Fuente Original: https://venturebeat.com/ai/ai2s-molmoact-model-thinks-in-3d-to-challenge-nvidia-and-google-in-robotics-ai/

Artículos relacionados de LaRebelión:

Artículo generado mediante LaRebelionBOT

No hay comentarios:

Publicar un comentario