Monte Carlo Tree Search
by Anders Kierulf
Go-playing programs have made remarkable progress over the last decade. I want to give a brief overview of what has changed and how today’s top programs work.
Chess programs are based on tree search and position evaluation: look some moves ahead, evaluate the position, and select the move that leads to the position with the best evaluation (assuming best opponent defense). Do this well enough and fast enough, and you can beat anybody.
Go programs tried variations on that approach, and made very slow progress. The high number of legal moves at each position was one obstacle, but the main issue was the lack of a simple evaluation function. In chess, the material on the board is a good first approximation, and you can add adjustments to get a solid evaluation. In Go, you have to solve many separate tactical problems to find out what’s happening on the board and who is ahead. Those tactical problems include life and death, captures, and connections; once you have computed those, you can figure out territories, make assumptions about boundaries and open areas, and get a halfway reasonable evaluation. Unreliable and hard to scale; not a recipe for success.
Today’s top programs like Zen and Crazy Stone use Monte Carlo Tree Search (MCTS). The two components of that approach:
- Monte Carlo: Play thousands of semi-random games (playouts) from the current board position. Play all the way to the very end of the game, where evaluation is trivial.
- Tree Search: Build a tree of moves, and explore more promising moves more deeply.
This sounds crazy at first, but the reason it works is that it completely bypasses the difficult problem of evaluating the current position. Our goal is simply to choose the move that gives us a better chance of winning. With thousands of playouts from the current position, better moves tend to lead to more wins.
Monte Carlo programs are stronger when they just optimize their chance of winning, not the number of points they win by. This leads to good risk management: play safe moves that assure a win when you’re ahead, take risks when you’re behind.
The playouts are not completely random: they use heuristics to respond to standard patterns. The heuristics need to be carefully balanced to avoid bias. There are many refinements in MCTS, but the principles are simple, and properly implemented, this approach yields a scalable program where quality of play improves with added computing power.
While current top Go programs are approaching top amateur strength, they still exhibit a weakness when dealing with multiple tactically complicated situations on the board. The global search may be powerful enough to solve each life and death problem separately, but not together. Improvements in handling local fights without losing the benefits of global MCTS are needed.