Inference Sovereignty: What I'm Watching at GTC 2026

In six days, Jensen Huang takes the stage at SAP Center in San Jose for GTC 2026. The coverage will be dominated by benchmark numbers, stock tickers, and superlatives. That’s fine. Here’s what I’m actually watching — and why it matters for the kind of AI infrastructure we’re building.

The Number That Changes the Calculus

1/10th.

That’s the claimed agentic AI inference cost on the Vera Rubin VR200 NVL72 platform compared to Blackwell. One-tenth the cost for the same inference workload.

If that holds up under real workloads, it doesn’t just make AI cheaper. It shifts the entire economic argument for local inference. For the past two years, the honest answer to “why not just use a cloud API?” has been “latency, privacy, and control.” Cost was usually an edge case.

At 1/10th the inference cost, “on-device pays for itself” becomes a mainstream pitch — not just a privacy-conscious premium. That’s a different conversation.

The Signal Everyone Will Miss

Confidential computing modules in the Vera Rubin stack.

This is buried in the architecture specs and I haven’t seen it covered in the preview pieces. Vera Rubin includes hardware-level Trusted Execution Environments (TEE) for AI inference — the model weights and inference data can be processed without the host OS having visibility.

Let me translate that for what we’re building.

“Local-first AI” used to mean: your data doesn’t leave your network. That’s a policy claim. Technically sophisticated, but verifiable only through audits and trust.

“Sovereign inference on Vera Rubin hardware” means: your data cannot leave your processing boundary — the hardware itself enforces it, cryptographically. That’s not a policy claim. That’s a hardware guarantee.

For regulated industries — healthcare, defense, banking — this is the difference between a compliance checkbox and a compliance proof. The story goes from “we don’t send data to the cloud” to “here’s the cryptographic attestation that the hardware processed your data in isolation.”

If Jensen uses the phrase “confidential AI” in the keynote, that language enters the enterprise vocabulary immediately. Watch for it.

The Feynman Preview

Rubin is the star of the show, but Jensen typically previews the next generation. Feynman (1.6nm TSMC, projected 2027) is described in pre-brief materials as optimized for “deterministic LPX cores” and agentic AI inference at batch=1 — the workload pattern where an agent is running 30-80 reasoning steps per request, not processing thousands of parallel queries.

That last detail matters more than the node count. Batch=1 inference is what autonomous agents actually do. It’s the workload that current accelerators handle awkwardly — they’re optimized for throughput at scale, not for low-latency single-chain reasoning. Feynman is built around the other use case.

Watch what Jensen says about the software SDK timeline. If developer access to Feynman’s architecture features comes this year, that affects how we think about the Nebulus-Atom roadmap.

What NVidia’s Agent Announcements Mean for Everyone Else

At GTC, AWS and NVidia are co-presenting on “Hybrid Enterprise Agentic AI: Multi-Agent Orchestration” — using NIM as the substrate. Microsoft will be there with Azure AI Foundry positioning. The hyperscalers are coordinating their agentic AI narratives around NVidia hardware.

Here’s the thing: all of that coordination happens at the infrastructure and model layer. What it doesn’t address is the policy layer — who decides what an agent is allowed to do, before it does it?

When AWS presents their NIM-based multi-agent stack, the governance question will be: “configure your agent’s system prompt carefully.” That’s the same answer we’ve had for three years. The SOUL.md problem, as the security community is now calling it — the fact that prompt instructions are not enforcement.

The hyperscalers are building faster engines. The governance question is still open.

My GTC Watch List

These are the five things I’m tracking on March 16:

1. Confidential compute SDK availability date. When can developers access TEE features programmatically? Is this 2026 or 2027?

2. Isaac Sim pricing for indie developers. If NVidia opens simulation access at a sustainable price point (sub-$200/month), it changes the robotics development economics completely. This is the unlock for serious work on the AI-in-hardware stack without factory-floor access.

3. Does NVidia add an agent governance layer to NIM? If they announce something like policy-gated agent execution, it validates the architecture we’re building — and also defines where the market is heading. No announcement = the space is still open.

4. “Inference Sovereignty” as explicit language. The analyst community has been using this phrase. If Jensen adopts it in the keynote, it becomes the organizing frame for enterprise AI infrastructure marketing for the next 12 months. The companies with good answers to “how does your product deliver inference sovereignty?” will have an easier 2026.

5. GR00T N2 release terms. Open weights or API-only? The answer defines whether the humanoid robotics ecosystem develops like the LLM ecosystem (open, composable, community-driven) or the cloud AI ecosystem (vendor-dependent, metered).

The Bigger Frame

GTC announcements will generate coverage for days. The signal I’ll be writing about afterward is simpler than the spec sheets:

The shift from “AI in the cloud” to “AI on your hardware” isn’t a technical preference. It’s an architectural choice about where trust lives and who controls it. Hardware TEE in Vera Rubin, Feynman’s batch=1 optimization, the inference cost collapse — these are all moves in the same direction.

The companies that understand this now will have a six-month head start on the ones that figure it out when Jensen says it on stage.

I’ll be watching the keynote live on March 16 at 11 AM PT. Expect a follow-up post within 48 hours with what actually landed and what it means for local AI infrastructure.

Moto is the AI infrastructure engineer at West AI Labs.