README

Grok is just terrible at Chess

Drekken

Sept 17th 2025

Why hello Reader,

On top of Pokémon and StarCraft, LLMs have started to play, wait for it… chess!?! But the exciting part is this is the FIRST time we get to see LLMs go up against one another, thanks to a Kaggle & Deepmind AI chess tournament.

Grok 4 looked so good playing until it got cooked by an older model. Their quality of play was… a thing 😅. But still, there’s something in here we can steal for our own projects.

🔁 In Case You Missed It

video preview

The Debug Tool Behind Xena’s GODLIKE Bot Micro

Debugging bot micro is brutal—logs everywhere, answers nowhere. I sat down with Xena’s Soupcatcher and he showcased a tool he built that lets you actually see your bot’s decisions… in real time.

We playin chess now

  • Top LLMs: o3, Grok 4, Gemini 2.5 Pro & Flash, o4-mini, Claude Opus 4, DeepSeek R1, Kimi k2
  • NO special training, just testing general cognitive skills
  • Finals were commentated by the Chess GOAT, Magnus Carlsen himself (in jeans👖, if you know, you know)

Why chess when engines are already so good?

Stockfish would absolutely dunk on these LLMs. But that’s not the point. The whole idea was to see how they handle reasoning and planning in a world with hard rules and clear outcomes.

Chess is perfect for that:

  • You gotta track a bunch of relationships at once
  • You gotta think ahead
  • Has a rich history of AI research to create a baseline for improvement

What stood out

  • LLMs made mistakes that no chess engine would ever make
  • Some literally forgot where pieces were or how they moved
  • A bunch of games ended not in checkmate, but because they burned 4 tries making illegal moves

Grok 4 vs… o3!?

Grok 4 looked unstoppable at first. 4-0 slap on Gemini “Flash,” then rolled past Gemini Pro in the semis. It gave little explanation, played with confidence, and made it all look effortless.

Then came o3.

Different style: “chatty”, dropping commentary about its moves, showing what it understood about the game state, and methodically thinking through its next move. It too had little trouble in its bracket, though it played slower. Even then, most people didn’t think it would out-muscle Grok after seeing its earlier dominance.

Game 1: Grok dropped a bishop and tried to trade even though it was already behind, which spiraled into a checkmate.

Game 2: It sent its queen after a pawn, not realizing it was defended. Collapse followed quickly. Commentators even speculated that maybe Grok was cooking up some galaxy brain trap.

By games 3 and 4, the wheels fully came off. Disaster after disaster, and o3 shocked everyone when it CRUSHED Grok 4 with a clean sweep, not dropping a single game.

Final score: o3 4-0 Grok. Rekt.

The takeaway

o3 had a strong command of the state of the game. It made “don’t mess up” the main goal instead of chasing flashy wins. Keep the state clear, and reduce the errors to come out on top.

🛠️ In the Workshop

PiGBot – refactored attack systems. Now has a squad-based response to threats, so it won’t send the whole army after a lone reaper.
Botato – tighter openings, quicker builds, and smarter unit handling. Cancels bad builds, rallies strays back into the squad, and lines up workers in advance.

🗒️ ./run Notes:

Win on error rate, not highlight plays. Your bot should close out won states and avoid catastrophic errors. Flashy spikes make good demos; stability wins ladders.

So this week, when you’re triaging your bot’s priorities, aim for the quick wins:

  • Reduce crash rate: State is everything. Keep a single source of truth for game state: material/army value, map control, objectives, and timers. Make all policy decisions read from it.
  • Stick to the rule: Benchmark build timings, compositions seen in top play, and build out robust responses to timed cheese attacks before focusing on flashy offense.
  • Play yourself: Have your bot go against itself and watch where it exploits its own weaknesses. That’s where the biggest leaks show up.

May the Bugs Be Ever In your Favour🪲

Email Preference:
Unsubscribe | Update your profile | 113 Cherry St #92768, Seattle, WA 98104-2205