Loowit Play — Yahtzee that learns

New game

Play a full 13-turn Yahtzee game. Pick who you're racing against — each agent plays its own game and you compare final scores.

Max ScorerPlays to score as high as possible.

Win Over EverythingCares only about beating you — gambles when behind.

Optimal (DP)Mathematically perfect play — averages ~254. Hard to beat.

Agents play:

Quick simAgents auto-advance to keep pace with you.

Watch decisionsStep the agents one turn at a time and see each move.

Dice:

Own diceEach player rolls their own — realistic multiplayer.

Shared diceEveryone gets the same rolls — removes luck, so strategy decides.

Agents now play turn-by-turn and react to the live scoreboard — watch Win Over Everything gamble when it falls behind.

Simulation lab

Run a batch of full games and see how the agents score. Pick one agent to see its score distribution, or two for a head-to-head (any combo — even an agent against itself).

Player 1

Player 2

Player 3

Dice

Games: 20

How the agents play differently

Each agent has a distinct fingerprint. The radar compares them across six dimensions; the table below shows their risk profile and how far each strays from optimal play — and crucially, whether it gambles when behind.

StrategicIntelligentAdaptiveFocused

How these agents learned Yahtzee

Loowit Play builds games that learn. Both opponents here are neural networks trained from scratch with reinforcement learning (PPO) — they started knowing nothing and played millions of games, improving by trial and error until strategy rose above the noise.

Curriculum, not chaos

Rather than dropping the agent into full games, training used a curriculum: first master single categories, then the upper section, then the lower section, then the full game. Skills build up layer by layer — you can't play a good full game without knowing how to optimise a single category first.

Two different goals

"Shared dice" — removing luck

When the agents are compared, they draw from the same dice stream, so any difference in score comes from strategy, not lucky rolls. The Winner was trained this way against a league of opponents, learning when to play safe and when to gamble.

The hint button

The "What would the agents do?" button runs each agent's policy on your exact dice and scorecard and shows its recommended move. Watch for spots where Max-Score and Winner disagree — that's the Winner taking a risk to win, even if it lowers its expected score.

Tools & tech

What's under the hood — from training the agents to shipping this site.

Machine learning & RL

PyTorchReinforcement Learning PPOActor–Critic networks GAECurriculum learning Self-play / leagueKL trust-region Reward shapingDynamic programming (optimal solver) NumPy

Inference & backend

ONNXONNX Runtime Pythonstdlib http.server REST / JSON APISQLite PBKDF2 authExpectimax search

Frontend

Vanilla JavaScriptHTML5 CSS3SVG charts Merriweather / Inter

Infrastructure & deploy

DockerCaddy AWS EC2 (Graviton/ARM64)Let's Encrypt TLS Cloudflare DNSLinux TensorBoard