Google's New TPUs, Meta's Llama 4, Grok APIs, Midjourney V7 Answers OpenAI

Google dropped their new TPU, Meta launched the much anticipated Llama 4, Grok finally opened their APIs, Midjourney answers OpenAI, and LLMs keep getting cheaper

Apr 11, 2025

Google’s Ironwood TPU Builds on Their Differentiation

Google just dropped Ironwood, its seventh-generation Tensor Processing Unit (TPU). Designed specifically for AI inference, Ironwood can scale up to 9,216 chips, delivering 42.5 exaflops of compute power.

Here’s how Ironwood compares to Google’s prior TPUs.

Ironwood is a signal of where AI is headed. Inference is the backbone of deploying AI in real-world applications, everything from chatbots that sound human to systems recommending your next watch on Netflix.

The race for GenAI supremacy is getting hotter, and as the one hyperscaler who is not reliant on Nvidia’s GPUs, Google has a powerful and far cheaper differentiator.

Meta’s Goes with Llama 4 & 10M Context Window

Meta introduced the Llama 4 family, kicking off with Scout (17 billion parameters, 16 experts) and Maverick (17 billion parameters, 128 experts), both excelling in multimodal tasks like text and image processing.

Scout fits on a single NVIDIA H100 GPU and boasts a 10M token context window, which is huge. For reference, Google’s Gemini 2.5 Pro, released last week, only has a 1M token context window.

Maverick beats GPT-4o and Gemini 2.0 Flash on benchmarks, offering top performance per cost. These models, distilled from the still-training Llama 4 Behemoth (288 billion parameters), are available on llama.com and Hugging Face, powering Meta’s apps like WhatsApp and Instagram.

Llama 4’s efficiency and multimodal capabilities make advanced AI more accessible to developers with limited resources. Their huge context window allows for analyzing massive documents or creating detailed outputs, useful for research or creative projects.

Grok 3 API Launched, 2x the Price of Gemini

Elon Musk’s xAI has launched an API for Grok 3, their flagship LLM.

Grok 3 costs $3 per million input tokens (roughly 750k words) and $15 per million output tokens, with a smaller Grok 3 Mini priced lower. Here are how Grok’s pricing compares, courtesy of Towards AI.

It’s surprising that Grok is >2x the costs of Gemini 2.5 Pro, which took the crown after their release last week. Users have also noted Grok’s context window is only 131k tokens, far less than the 1 million promised.

Their high cost and performance questions suggest it’s still evolving compared to rivals like OpenAI. It could be that their resources are constrained towards consumer applications.

For those consumer users, it could mean more AI-driven features on platforms like X. Which developers will build with Grok 3, and will they prioritize accessibility or niche innovation?

Midjourney Answers OpenAI with V7 Release

Midjourney released their V7 model. The model enhances AI-generated art, improving how it interprets text prompts and rendering images with finer details, like realistic hands or textures.

It includes personalization, letting users tailor outputs to their style, and Draft Mode, which creates images 10 times faster at half the cost for quick prototyping. Conversational and voice modes make it feel interactive, like a creative partner.

V7 makes high-quality AI art easier and cheaper, and Draft Mode speeds up brainstorming, which could streamline workflows in industries like gaming or advertising. Their intuitive features lower the learning curve, inviting more people to create with AI.

But even with these intuitive features, can Midjourney pull users away from OpenAI? After OpenAI went viral for going “Full Ghibli”, Midjourney had to ship an update, but is this update too late?

AI Inference Costs Keep Dropping

Falling inference costs are nothing new, but the AI Index 2025 did a great job visualizing just how much and how fast prices keep falling.

In 2022, using a model like GPT-3.5 cost $10 per million tokens. Today, far more performant models like GPT-4o are only $2.50 per million tokens. Smaller models, such as Llama-3.1-Instruct-8B, offer strong results at even lower prices.

Just look up in this article, how we mentioned Grok 3 is charging $3 per million tokens, which is over 2x what Gemini Pro 2.5 is charging at $1.25 per million. Consumers are in an incredible era of generative AI affordability, and these falling prices should spark more innovation in areas like personalized learning or small-business tools.