In partnership with

Hey folks,

Every time a new AI model drops, I get a little jolt of "I need to stay on top of this" so I don't fall behind. Last week it was Kimi K2.7-Code, the newest, best model, with a chart showing it beating the tools I use every day. My first instinct was to go try it. My second, thankfully, was to wait.

Good thing. Within days the numbers fell apart.

Here's what nobody tells you: there's a new "best" model just about every week now, more than a dozen frontier releases in the first half of this year alone. You can't keep up, and quietly, chasing each one makes you worse at the tools you've already got.

So today, no tutorial. Just one honest argument and a 60-second trick for telling the launches worth your time from the ones built to make you feel behind.

Let's dive in..

Be honest, how many AI tools did you switch between last month?

Login or Subscribe to participate

A launch is a sales pitch in a lab coat

Here's the thing I had to learn the slow way. A model launch isn't really an announcement. It's a sales pitch wearing a lab coat.

That doesn't mean the models are fake. The frontier really is moving fast, and some of these releases genuinely are better than what came before. But the stuff wrapped around a launch, the chart with your current tool sitting in second place, the "state of the art" headline, the thread with the one perfect demo, that's marketing. It's built to make you feel like you should be shopping for a new model every few weeks.

And the trick underneath it is almost rude in how simple it is. The lab picks the one test it wins. It runs its own model flat out and the competition with the handbrake on. Then it puts the chart out before anyone neutral has had a chance to check. By the time the correction lands, the headline has already done its job and moved on.

So what actually pays off? Not knowing which model launched yesterday. It's getting good enough at one tool that you'd feel it the day something better showed up.

Most people never get there. They're too busy switching.

Want proof? Last week handed us some.

Back to that Kimi launch. The benchmark table put it level with GPT-5.5 and Claude Opus 4.8, the big guns from OpenAI and Anthropic. Genuinely strong numbers. The kind of chart that's in 50 newsletters by breakfast.

Then you look a little closer. Three days on, the model still had no independent scores on any test outsiders can check. Every number in that table was the lab's own. And the rivals it "beat"? Run in different settings than Kimi, which is a bit like timing your car against someone else's while theirs idles in the driveway.

None of that makes Kimi bad. It might be excellent. The point is you had no way of knowing on launch day, and the chart was built so you wouldn't think to ask. A real number and a made-up one look exactly the same on screen.

LLM traffic converts 3× better than Google search

58% of buyers now start their research in ChatGPT or Gemini, not Google. Most startups aren't showing up there yet.

The ones that are get cited by the AI tools their buyers, investors, and future hires already use. And they convert at 3×.

Download the free AEO Playbook for Startups from HubSpot and get the exact steps to start showing up. Five minutes to read.

So how do you tell the difference?

You don't need to understand benchmarks. You need 4 questions. Run them on any launch, model or tool, before you let it move you.

1. Who ran the test? If the only numbers come from the company selling the thing, that's a claim, not a result. Treat their benchmark as an opening offer, not a fact.

2. Has anyone independent checked? Real proof shows up on public leaderboards that outsiders run. No independent scores yet? The launch is asking for your trust. Wait a week. The real numbers always come, and they're usually lower.

3. Were the rivals run fairly? Look for the asterisk. If they pit their best setup against everyone else's average one, the gap on the chart isn't the gap in your actual work.

4. Does it change your Monday? The only one that really matters. A 2-point benchmark win you'll never feel isn't worth relearning your tools. "Faster at something I do 10 times a day" is. Most launches flunk this one, and honestly, that's fine.

Pro tip: Don't want to do this by hand? Make your current AI do it for you. Next time a launch lands in your inbox, paste it in with the prompt below.

Read this AI launch announcement as a skeptical analyst, not a fan. 
Answer in 4 short lines: 

(1) Which numbers are the company's own vs. independently verified? 
(2) What public, audited benchmarks are missing? 
(3) Were competing models run in comparable conditions, or is there an asterisk? 
(4) For someone whose job is [your job here], does anything here change their workflow, yes or no, and why? 

No hype. If something is unverifiable, say so.

Watch out: Sometimes the AI gets swept up in the announcement's own hype and grades it kindly. If every answer comes back glowing, push back once: "Be harder. What would a critic say is overstated here?" The second pass is usually the honest one.

And soon your phone is going to ask you to pick

Here's why this only gets more important.

At its developer conference on June 8, Apple announced that iOS 27 will let you choose which AI runs inside your iPhone. Not just ChatGPT anymore. You'll be able to set Claude, Gemini, ChatGPT, or Grok as the assistant behind Siri, writing tools, and image features, all from one toggle in Settings.

It's not live yet, so don't go hunting for it. The full release is months out, around September, and the part that lets you swap in another provider wasn't even shown on stage. But the direction is set. The one-size-fits-all assistant is on its way out.

When that day comes, the same rule holds. Apple will pick a default for its reasons, not yours. The coverage will crown a winner. And the smart move will be the boring one: choose the tool that fits the work you actually do, learn it properly, and stop re-shopping every time a new chart floats by.

Links worth your click

Try this now

Go find the last "new AI model" email sitting in your inbox. You've got one, I promise. Before you do anything with it, run the 4 questions: whose numbers, checked by whom, fair fight, and does it change your Monday.

I'd put money on it failing question 4. Most do. Then go do the thing that actually moves the needle: open the one AI tool you already pay for and spend 15 minutes getting better at it. Learn one feature you've been ignoring. That quarter of an hour will do more for you than any model you could have switched to this month.

👉️ Know someone who gets that same "am I behind?" jolt every time a model drops? Forward this to them. When one person subscribes from your unique referral link, you unlock the Prompt Vault: 10 copy-paste prompts that do real work.

And hit reply with one thing: which AI tool do you actually use most, and what nearly tempted you to switch? I read every reply, and it shapes what I write next.

Cheers, Tim

Keep Reading