Gemini 3 Crushes Benchmarks, ChatGPT Needs Humans, Anthropic's New Alliances, & OpenAI Overplays Their Hand

Google proves their TPU infrastructure with Gemini 3, ChatGPT shows that enterprises still need people with group chats, and Anthropic cashes in on OpenAI overplaying their hand with Nvidia

and

Nov 20, 2025

Gemini 3 Proves Google’s Vertical Integration Strategy & Validates Nvidia

Google released Gemini 3 Pro on November 18th, and the model dominates virtually every major AI benchmark—except one that matters.

Gemini 3 Pro scored 37.4% on Humanity’s Last Exam (the previous record was GPT-5 Pro at 31.64%) and claimed the top spot on LMArena with 1,501 Elo. The model crushes GPT-5.1 and Claude Sonnet 4.5 across math, reasoning, and multimodal benchmarks. On ARC-AGI-2, Gemini 3 Deep Think hit 45.1%, triple the next-best model.

They also stood out for another reason: speed. Early reviews consistently highlight that Gemini 3’s “intelligence per second” blows away competitors. Matt Shumer called it his new daily driver because it “often outperforms GPT-5 Pro without the 5-10 minute wait.”

The exception? SWE-Bench Verified, the real-world coding benchmark where Claude Sonnet 4.5 still leads at 77.2%. Anthropic’s secret sauce in coding is driving most of their enterprise revenue, and that moat holds, for now.

Google wins by proving their vertically integrated solution works. They trained Gemini 3 exclusively on their TPUs, delivering best-in-class models with the lowest cost structure in the industry. No Nvidia dependency means better margins.

Anthropic wins because they’re partnering with Google’s TPUs (more on that below) and still maintain best-in-class coding capabilities despite Gemini’s dominance elsewhere. Their Claude Sonnet 4.5 remains the enterprise choice for software development.

Nvidia counterintuitively wins too. Google just demonstrated that throwing more compute at the problem works, everyone else catches up with enough chips. This chart sums it up well.

Notice how Gemini 3’s scores blow out the competition (Y-axes, Score %), but you’re paying for it (X-axis, cost per task). For everyone else, that compute comes from Nvidia.

Apple’s white-labeling decision looks prescient. We wrote about Apple paying Google roughly $1 billion annually for a custom 1.2 trillion parameter Gemini model to power the revamped Siri. Apple evaluated Anthropic (better model, higher fees) and OpenAI before choosing Google based on price and their existing $20 billion search relationship. With Gemini 3’s performance, that call looks even smarter.

AWS Trainium looks worse by comparison. Google’s TPUs are producing frontier models. Anthropic is training on Google’s infrastructure and Microsoft’s Nvidia GPUs. AWS has announced Trainium2 with impressive specs but hasn’t announced a model this capable hosted on their TPUs.

The message is clear: Google figured out custom silicon for AI training, AWS has not.

ChatGPT’s Group Chats Reveal the Human-in-the-Loop Reality

OpenAI launched group chats in ChatGPT, currently piloting in Japan, New Zealand, South Korea, and Taiwan. Up to 20 people can collaborate with ChatGPT in a single conversation, powered by GPT-5.1 Auto.

But this isn’t what enterprises were hoping for with AI.

Four mobile screenshots showing how to start a group chat in ChatGPT.

The ultimate enterprise dream has been a single, all-powerful human manager overseeing a swarm of independent, autonomous AI agents. Basically a digital workforce running on autopilot, like how ERPs replaced back-offices.

But instead, the productivity boost is coming from a team of people who are collaborating with the same tool, in this case ChatGPT. The work is shared and transparent, and work can be handed off without losing context.

In other words, OpenAI is admitting that their LLMs are not ready for fully autonomous agents yet. LLMs are making people more productive, like spreadsheets did for accountants and bankers.

That’s powerful, but it requires more humans in the loop than enterprises hoped for.

If autonomous agent swarms were working, OpenAI would be shipping those interfaces. Instead, they’re building collaboration tools that assume multiple humans working alongside AI.

Microsoft, Nvidia, and Anthropic Partnership while OpenAI Overplays Their Hand

Microsoft, Nvidia, and Anthropic announced strategic partnerships that reshape the AI infrastructure landscape, while exposing OpenAI’s weakening position.

Anthropic committed $30 billion for Azure compute capacity and up to 1 gigawatt of additional capacity. Nvidia is investing up to $10 billion in Anthropic, Microsoft up to $5 billion. The deal values Anthropic at roughly $350 billion, up from $183 billion in September.

We also learned something big after Nvidia’s earnings: their OpenAI partnership was only a letter of intent, no contract was signed.

From Nvidia’s 10-Q (bold emphasized by me).

We expect to continue investing in strategic partnerships. In the third quarter of fiscal year 2026, we entered into a letter of intent with an opportunity to invest in OpenAI. In November 2025, we entered into an agreement, subject to certain closing conditions, to invest up to $10 billion in Anthropic.
There is no assurance that we will enter into definitive agreements with respect to the OpenAI opportunity or other potential investments, or that any investment will be completed on expected terms, if at all.
The timing and magnitude of these and other investments we may make will depend on various factors, including the ability of our partners to successfully develop and deploy AI infrastructure.
There can be no certainty as to the timing or amount of capital we may ultimately invest in these or other strategic partnerships, and we may be limited by our available liquidity and capital resources.

Sam Altman signed deals with AMD and Broadcom after this letter of intent, and now it appears Nvidia is walking away. Altman may have overplayed his hand signing with AMD & Broadcom before executing a contract with Nvidia, making OpenAI’s massive infrastructure buildout significantly harder to finance.

Anthropic On All Three Clouds

Anthropic is now the only frontier model available across all three major clouds (AWS, Azure, GCP).

We’ve previously written about Anthropic diversifying from AWS so they avoid OpenAI’s Microsoft dependency. But Gemini 3’s release shows Anthropic may have more motives.

Google has best-in-class TPUs, Microsoft has more Nvidia GPUs, and AWS can’t figure out Trainium. By securing compute from Microsoft, Nvidia, and Google, Anthropic ensures they stay on best in class infrastructure, which is less likely on AWS.

Anthropic just keeps building on their momentum by cornering the enterprise, and now securing more infrastructure. All while Nvidia put OpenAI in a new position, playing catch up.