jueves, 12 de marzo de 2026

Nvidias Nemotron 3 Super Revolutionises AI Efficiency

Nvidia has unveiled Nemotron 3 Super, a groundbreaking 120-billion-parameter hybrid AI model designed to tackle one of enterprise artificial intelligence's most pressing challenges: the explosive cost of multi-agent systems. These systems, which handle complex, long-horizon tasks such as software engineering and cybersecurity triaging, can generate up to 15 times the token volume of standard chatbots, making them prohibitively expensive for many organisations. Nvidia's latest release aims to deliver the depth required for sophisticated agentic workflows without the typical computational bloat, all whilst maintaining commercial viability through an open weights licence.

Nvidia's Nemotron 3 Super Revolutionises AI Efficiency

At the heart of Nemotron 3 Super lies a sophisticated triple hybrid architecture that represents a significant departure from traditional AI model design. The system combines three distinct architectural approaches: state-space models, transformers, and a novel Latent Mixture-of-Experts design. The Hybrid Mamba-Transformer backbone interleaves Mamba-2 layers—which handle the bulk of sequence processing with linear-time complexity—with strategically placed Transformer attention layers that act as "global anchors" for precise fact retrieval. This architecture enables the model to maintain a massive 1-million-token context window without the memory footprint typically associated with such capabilities, solving the classic "needle in a haystack" problem that plagues many enterprise applications.

The Latent Mixture-of-Experts component further distinguishes Nemotron 3 Super from its competitors. Traditional MoE designs route tokens to experts in their full hidden dimension, creating computational bottlenecks as models scale. Nvidia's LatentMoE innovation projects tokens into a compressed space before routing, allowing the model to consult four times as many specialists for the same computational cost. This granularity proves vital for agents that must seamlessly switch between different reasoning modes—Python syntax, SQL logic, and conversational understanding—within a single interaction. Additionally, Multi-Token Prediction serves as a built-in draft model, enabling native speculative decoding that delivers up to three times faster wall-clock speeds for structured generation tasks.

Perhaps the most significant technical advancement is Nemotron 3 Super's optimisation for Nvidia's Blackwell GPU platform. Pre-trained natively in NVFP4 (4-bit floating point), the model achieves four times faster inference than 8-bit models running on the previous Hopper architecture, with no accuracy loss. In practical performance, the model currently holds the number one position on DeepResearch Bench and demonstrates throughput advantages of up to 2.2 times higher than GPT-OSS-120B and 7.5 times higher than Qwen3.5-122B in high-volume settings. Major enterprises including Siemens, Palantir, CodeRabbit, and Greptile are already integrating the model for large-scale codebase analysis and complex workflow automation across manufacturing and cybersecurity applications.

Fuente Original: https://venturebeat.com/technology/nvidias-new-open-weights-nemotron-3-super-combines-three-different

Artículos relacionados de LaRebelión:

Artículo generado mediante LaRebelionBOT

No hay comentarios:

Publicar un comentario