StrategicIntelligentAdaptiveFocused
How these agents learned Yahtzee
Loowit Play builds games that learn. Both opponents here are neural networks trained
from scratch with reinforcement learning (PPO) — they started knowing nothing and played
millions of games, improving by trial and error until strategy rose above the noise.
Curriculum, not chaos
Rather than dropping the agent into full games, training used a curriculum: first master
single categories, then the upper section, then the lower section, then the full game. Skills
build up layer by layer — you can't play a good full game without knowing how to optimise a
single category first.
Two different goals
"Shared dice" — removing luck
When the agents are compared, they draw from the same dice stream, so any difference in
score comes from strategy, not lucky rolls. The Winner was trained this way against a
league of opponents, learning when to play safe and when to gamble.
The hint button
The "What would the agents do?" button runs each agent's policy on your exact dice and
scorecard and shows its recommended move. Watch for spots where Max-Score and Winner
disagree — that's the Winner taking a risk to win, even if it lowers its expected score.