🎯 Opus v Codex

In partnership with

Happy Friday folks,

OpenAI just upgraded Deep Research with GPT-5.2 and added website-specific search. You can now lock research to trusted sources instead of letting it crawl the entire web. The interface got an overhaul too for full-screen reports, real-time progress tracking, and the ability to interrupt mid-research to add sources or redirect. Meanwhile, OpenAI's celebrating 1 million Codex downloads in week one while quietly warning that free tier limits are coming. Classic move: build demand, then meter access.

Today in AI:
→ Opus v Codex: Why benchmarks stopped mattering
→ OpenAI's free Codex tier won't last (1M downloads in week 1)
→ How To Use Claude Code Insights To Fix Your Workflow

Let's dive in..

🎯 Opus v Codex: Welcome to the Post-Benchmark Era

Opus 4.6 and Codex 5.3 dropped within days of each other. Both companies hyped their numbers. Both hit new highs on SWE-Bench and coding evals. And when researchers used them, the benchmarks meant nothing.

Codex 5.3 wins on paper. It's better at finding bugs. Faster. Uses fewer tokens. Beats Opus on most evals.

Opus 4.6 wins in practice. More reliable on real tasks with better UX. Doesn't ignore instructions when juggling multiple requests. Claude Code's interface makes it easier to actually get work done.

The gap between "scores well" and "works well" is now massive. Gemini 3 topped benchmarks and basically nobody uses it. Codex technically outperforms Opus on evals but loses on adoption. Why? Claude's product is just more usable.

Three things replace the benchmarks:

Agent orchestration - Can it coordinate multiple tasks without falling apart?
Instruction following - Does it actually do what you asked, or improvise?
Product polish - Is the interface pleasant enough that people use it daily?

Both models regressed on instruction following when handling queued tasks. That's a step backward disguised as a capability upgrade. The companies chasing benchmark points missed what users need.

We're in a world where the "worse" model wins if it ships with better UX. Product execution beats model capability. And the launch playbooks haven't caught up yet.

For your team: Stop asking which model scores better? Start asking which one do people actually use? If you're building on frontier models, test with real workflows, not evals.

Together with Kajabi

Where Expertise Becomes a Real Business

Kajabi was built for people with earned expertise. Coaches, educators, practitioners, and creators who developed their wisdom through real work and real outcomes.

In a world drowning in AI-generated noise, trust is the new currency. Trust requires proof, credibility, and a system that amplifies your impact.

Kajabi Heroes have generated more than $10 billion in revenue. Not through gimmicks or hype, but through a unified platform designed to scale human expertise.

One place for your products, brand, audience, payments, and marketing. One system that helps you know what to do next.

Turn your experience into real income. Build a business with clarity and confidence.

Kajabi is where real experts grow.

Start Your Free Trial

📊 OpenAI's Free Codex Is Ending

Codex hit 1 million downloads in its first week. Now OpenAI's warning: the free tier won't last.

Sam Altman posted the update himself. "We'll keep Codex available to Free/Go users after this promotion; we may have to reduce limits there but we want everyone to be able to try Codex and start building." Translation: if you're not paying, get ready for throttling.

Here's what's happening:

The rush: 60% week-over-week growth in Codex users after the app launched. Developers downloaded it faster than any OpenAI tool in history. The desktop app, parallel agents, and scheduled tasks hit exactly what people wanted.

The reality check: Running agents for 30 minutes per task at scale costs real money. Free and Go tier users are about to hit stricter limits. Paid subscribers already get doubled rate limits and will stay prioritized.

The bigger picture: This is OpenAI testing demand before locking down pricing. They did the same dance with GPT-4 and DALL-E. Give it away, build dependency, then meter access. Classic SaaS playbook.

Meanwhile, ChatGPT Deep Research got a quiet upgrade worth knowing about:

GPT-5.2 model - Smarter research, fewer hallucinations
Website-specific search - Lock research to trusted sources instead of the entire web
Real-time controls - Add sources or redirect mid-research without restarting
Fullscreen document viewer - Table of contents on the left, citations on the right, actually readable

For your team: If you're using Codex heavily on the free tier, budget some cash before the limits hit. Don't get caught mid-project when throttling kicks in.

🎯 How To Use Claude Code Insights To Fix Your Workflow

Claude Code just shipped a command that reads your last month of work and tells you what you're doing wrong. It's called /insights, and it's basically a performance review for how you use AI.

Here's how it works and why it's useful:

Open Claude Code in any project and type /insights
Claude reads your message history from the past 30 days across all your sessions.
Wait 2-3 minutes while it generates an HTML report
Read the report (it'll surprise you)

The report has four sections:

What's working - Your effective patterns.
What's hindering - Bottlenecks killing your speed.
Quick wins - Features you're not using but should be.
Project analysis - How you're using Claude across different codebases

Why this is useful:

Most people use 20% of Claude Code's capabilities. The /insights report shows you the 80% you're missing, with copy-paste examples specific to your work.

🐝 AI Buzz Bits

— # (#)

📄 ChatGPT Deep Research got a proper document viewer. Full-screen mode with table of contents on the left, citations on the right. Export to Markdown, Word, or PDF. Plus: GPT-5.2 model upgrade, website-specific search, and real-time research controls.

🇨🇳 Five Chinese AI labs racing to ship before Lunar New Year. Zhipu, MiniMax, ByteDance, Alibaba, and Moonshot are all debuting next-gen models in February. GLM-5, Doubao 2.0, Qwen 3.5, Kimi K2.5 all dropping within weeks of each other. The most concentrated period of Chinese AI releases ever.

❄️ Snowflake launched Cortex Code. An AI coding agent that understands your Snowflake data context. Ask in plain English, get SQL, dbt pipelines, or admin actions. Unlike generic coding assistants, it knows your databases, roles, and governance.

🛠 Tool Spotlight

QA.tech — AI testing that finds bugs. Point it at your web app and it autonomously explores, generates test cases, and adapts when your UI changes. No manual test scripting.
CreateOS — Build and deploy apps from any AI coding tool in one place. Connect Cursor, Claude Code, or any MCP-compatible tool and get instant deployment with no DevOps setup.
Atoms — A full AI team that builds and launches products. Describe your idea, and specialized agents handle research, design, code, auth, payments, and deployment.
Crisp — AI customer support that doesn't feel robotic. Train it on your docs, deploy across chat, WhatsApp, email, and Instagram. Automates 50% of support inquiries while escalating the tricky stuff to humans.

For a full list of 1500+ AI tools, visit our Supertool Directory

👉 Know someone drowning in AI news? Forward this to them or send your unique referral link

Cheers, Tim

🎯 Opus v Codex

Happy Friday folks,

🎯 Opus v Codex: Welcome to the Post-Benchmark Era

Together with Kajabi

Where Expertise Becomes a Real Business

📊 OpenAI's Free Codex Is Ending

🎯 How To Use Claude Code Insights To Fix Your Workflow

🐝 AI Buzz Bits

🛠 Tool Spotlight

How was todays newsletter?

Keep Reading

AI: Beyond the Buzz

Home

AI tools