Claude Sonnet 4.5 Sets Coding Standard, but Cognition's Rebuild Raises Questions, & How OpenAI's Instant Checkout Opens the Long Tail of Commerce

Sonnet 4.5 is the best coder, but Cognition's rebuild shows the rising integration costs, and OpenAI's new checkout feature helps the long tail of commerce get discovered for niche problems

and

Oct 02, 2025

Sonnet 4.5 Sets Coding Standard, but Devin’s Rebuild Raises Questions

Anthropic released Claude Sonnet 4.5, setting the highest code performance score on SWE-bench Verified. The new model delivers major leaps in reasoning, math, and computer use.

Chart showing frontier model performance on SWE-bench Verified with Claude Sonnet 4.5 leading

They’ve found incredible product-market fit in coding, and are building on that momentum with each subsequent release.

We wrote in August about how Anthropic had to keep leading in coding, as roughly 10% - 20% of their revenue comes from Cursor alone. If Google’s Gemini 3 is more effective, which is rumored to be released later in October, Cursor could switch over, dropping Anthropic’s revenue significantly.

The reaction of Anthropic’s new model, coupled with Gemini 3’s upcoming release, reminded me of a meme that went viral over the past release wave.

Which leaves the question, how many companies will immediately adopt Sonnet 4.5? If it was as simple as plugging in a new LLM, I would imagine everyone might as well switch, but Cognition shows that just plugging in Sonnet 4.5 may be a bad idea.

Cognition Rebuilt Devin for Sonnet 4.5

Cognition shared an interesting update, explaining how they rebuilt Devin for Claude Sonnet 4.5.

Now, you can see how much better the new version of Devin is, but also notice how some performance would have decreased if they just plugged Sonnet 4.5 into the old Devin.

Rebuilding apps is no small order, and this will significantly increase ongoing costs for applications like Devin, especially when you consider the velocity of new models coming out.

And given how frequently new models are being released, remember the meme above, will companies wait until they see Gemini 3’s results, or go full steam ahead with Sonnet 4.5?

Since the betting market is heavily favoring Google as being the best AI model by end of 2025, it may not hurt waiting a couple weeks, especially if they’re facing similar costs as Devin.

This wrinkle could effect future valuations of applications like Devin, given that ongoing costs would be higher than previously assumed.

If those costs are higher, that could effect frontier LLM adoption as well. Will applications urgently adopt the next big thing, or wait for more options?

How OpenAI’s Instant Checkout Opens the Long Tail of Commerce

OpenAI launched Instant Checkout, allowing ChatGPT users to purchase items from Etsy’s domestic sellers and select Shopify merchants directly in ChatGPT.

This is powered by a new and open-sourced Agentic Commerce Protocol, developed by Stripe & OpenAI.

While this announcement shouldn’t be too surprising, it is pretty exciting for the long tail of e-commerce vendors who were previously buried in Google, Shopify, or Etsy.

That excitement lies in ChatGPT’s 700 million weekly active users who are asking commerce questions for niche problems. You kind of know what you want, or know the problem you need solved, and you need something that refines this into an actual product that you can buy.

This happened with us earlier this year. We wanted a slide for the kids in our backyard, and Google showed us our options.

As you can see, they’re pretty expensive! We stopped by Home Depot for that cheaper green slide on the bottom left, but they didn’t actually keep it in store.

So then we showed ChatGPT a picture of our backyard, and it gave us a novel idea: high density polyethylene (HDPE) plastic! There was a plastic store 15 minutes away in San Mateo who sold it by the roll, so we ordered 15 feet of plastic, some artificial grass from Costco, and had a big slide ready for less than $150.

Now imagine all of the other long tail products that could help with niche problems. These were impossible to find because users didn’t know that the idea or product existed, just like our slide problem. With AI’s knowledge, these novel products or ideas can surface instantly, giving users a better solution, while providing more revenue for those merchants.

As for how this deal works, OpenAI explains it in their announcement (bold emphasized by me).

Merchants pay a small fee on completed purchases, but the service is free for users, doesn’t affect their prices, and doesn’t influence ChatGPT’s product results. Instant Checkout items are not preferred in product results. When ranking multiple merchants that sell the same product, ChatGPT considers factors like availability, price, quality, whether a merchant is the primary seller, and whether Instant Checkout is enabled, to optimize the user experience.

So back to that big plastics store in San Mateo. They don’t do much business online, and would still appear on ChatGPT regardless of whether they used Instant Checkout.

But imagine companies who sell HDPE online, perhaps they offer Instant Checkout, and ChatGPT would show their link as an option as well. If we bought from them instead, they would pay OpenAI a cut.

This feels like a big win for users, that slide was a massive win for our kids, and brings more demand to commerce, especially the long tail of vendors who have a novel solution that was previously buried in Google.

Claude Sonnet 4.5 Sets Coding Standard, but Cognition's Rebuild Raises Questions, & How OpenAI's Instant Checkout Opens the Long Tail of Commerce

Sonnet 4.5 is the best coder, but Cognition's rebuild shows the rising integration costs, and OpenAI's new checkout feature helps the long tail of commerce get discovered for niche problems

Sonnet 4.5 Sets Coding Standard, but Devin’s Rebuild Raises Questions

Cognition Rebuilt Devin for Sonnet 4.5

How OpenAI’s Instant Checkout Opens the Long Tail of Commerce

Discussion about this post