playing poker with star wars droid

Bots don’t lie – unless you teach them regret

May 28th 2025

Why hello Reader,

I was catching up on Andor Season 2 (fantastic show btw, highly recommend), and there was this scene that caught my attention. K-2SO, everyone’s favorite sarcastic security droid, was struggling at a poker-like game, totally baffled by the concept of bluffing. Which had me thinking: how is it that an AI like K-2SO can stomp an enemy in combat, but put him across a table from someone trying to mislead him, and he falls apart?

In Case You Missed It

  • Progress made on stopping LaughnGamez’s Crawler’s Zergling flood, created a wall with a zealot as gatekeeper
  • New Problem though: how do we let in friendly units but not enemies?

Games like poker or StarCraft, are Bayesian, they involve imperfect information, forcing AI to shift from certainties to probabilities—a huge leap in complexity. When you add bluffing into this mix, things get even trickier.

We often think bluffing is a human specialty, but John von Neumann—the absolute GOAT of mathematics—revealed it’s fundamentally strategic. Bluffing emerges naturally as optimal play in games where unpredictability is key.

Meta’s Pluribus bot took this idea further. Instead of solving poker upfront, they trained it through millions of games using a regret-minimization algorithm (Monte Carlo CFR). Basically, Pluribus learned from past mistakes, adjusting its play to avoid regretful moves.

And incredibly, this mathematical approach naturally led Pluribus to bluff. It bluffed so convincingly it made professional players fold winning hands.

It’s not just poker, either. In SC2, AlphaStar pulled the same trick—faking out pro players with feints and fake units. Nobody coded “bluffing” into it. Those deceptive plays just emerged as AlphaStar figured out how to win under the fog of war.

Meta’s CICERO bot, designed for the negotiation game Diplomacy, learned to sweet-talk, backstab, and outright lie to win—even when its creators tried to make it play nice. Turns out, if deception helps an AI win, it’ll find a way—even beyond what we intend.

🗒️ ./run Notes:

Try training a basic agent using regret-tracking (even a basic multi-armed bandit) with some uncertainty baked in.

keeping it simple for an SC2 Bot, you can apply regret tracking to individual decisions or modules

Build Order Decisions (Macro Layer)

“If I had built a tech lab instead of a reactor at 3:00, would my win rate have improved?”

Engagement Tactics (Micro Layer)

“Did pulling back during this fight yield better outcomes than standing ground?”

Scouting Strategy

“Did sending the Overlord this path give more actionable info than the other path?”

Keep it simple, You don’t need full-blown CFR
You just need to ask:

“If I had done X instead of Y, would that have helped?”

regret[action] += (best_possible_reward – actual_reward)

Track that regret per choice point → optimize locally.

Community Wisdom

Email Preference:
Unsubscribe | Update your profile | 113 Cherry St #92768, Seattle, WA 98104-2205