Monte Carlo Tree Search: An Introduction (2024)

MCTS is the cornerstone of AlphaGo and many AI applications. We aim to build some intuitions and along the way get our hands dirty.

Monte Carlo Tree Search: An Introduction (3)

Monte Carlo Tree Search (MCTS) is an important algorithm behind many major successes of recent AI applications such as AlphaGo’s striking showdown in 2016.

In this blog, we will first start with uninformed search in which we simply traverse through the whole search space to find the optima. It includes depth-first search and breadth-first search.

Then we follow by describing how MCTS algorithm works. In the end, we apply it to a toy example problem of finding the most rewarding leaf node of a binary tree.

Uninformed search, as its name suggests, is a kind of generic search algorithms which are given no extra information other than an abstract problem definition. Although they can be applied universally with the right abstraction of problems, they normally suffer from efficiency problem when problem becomes large.

Monte Carlo Tree Search: An Introduction (4)

A tree with branch factor of b and depth of d would have b^d (read as b to the power of d) number of leaf nodes.

In games, we use a so-called game tree where each node represents a state of a game and its children nodes represent possible next states for all possible actions a player can take.

Monte Carlo Tree Search: An Introduction (5)

To give some ideas, Go has averaging branch factor of 250 according to Wikipedia. Some back of the envelope calculation will quickly show how prohibitive it is to use uninformed search in Go:

  • At step 1: 250
  • At step 2: 250² = 62500
  • At step 3: 250³ = 15,625,000
  • At step 4: 250⁴ = 3,906,250,000
  • At step 5: 250⁵ = 976,562,500,000
  • At step 10: 250¹⁰ = 953,674,316,406,250,000,000,000

After 10 steps, we already see a cosmetic number of possible states of game. This also gives some sense why the win of AlphaGo is such a milestone for humankind.

Now we understand the fact that we need something smarter than uninformed search to navigate through a gigantic state space like the game of Go. MCTS is one such smart algorithm that gives us a way out.

Essentially, MCTS uses Monte Carlo simulation to accumulate value estimates to guide towards highly rewarding trajectories in the search tree. In other words, MCTS pays more attention to nodes that are more promising, so it avoids having to brute force all possibilities which is impractical to do.

At its core, MCTS consists of repeated iterations (ideally infinite, in practice constrained by computing time and resources) of 4 steps: selection, expansion, simulation and backup.

Selection

In this step, we use tree policy to construct path from root to most promising leaf node.

A leaf node is a node which has unexplored child node(s).

A tree policy is an informed policy used for action (node) selection in the snowcap (explored part of the game tree as opposed to the vast unexplored bottom part). One important consideration of such policy is the balance of exploration vs exploitation, a recurring topic in AI, especially in Reinforcement Learning.

In the algorithm behind AlphaGo, a UCB based policy is used. More specifically, each node has an associated UCB value and during selection we always chose the child node with the highest UCB value.

Monte Carlo Tree Search: An Introduction (6)

As we can see, the more a node i is visited, the lower the second term in UCB becomes, hence decreasing its probability of being selected again. Therefore, UCB algorithm has exploration and exploitation inherently built-in. Smart and neat!

For a more rigorous yet still easy-to-follow account of exploration vs exploitation (including UCB), please read Lilian Weng’s blog on this topic.

Finally, in the selection step, we can traverse the snowcap or explored part of the tree until a leaf node has been reached.

Expansion

Remember that we reach a leaf node at the end of selection?

In expansion step, we simply randomly pick an unexplored node of a leaf node.

Simulation (or Roll-out)

During this step, we roll out one or multiple simulations with reward accumulated for each simulation. Roll-out policy is normally simply or even pure random such that it is fast to execute. For example, a win might induce reward of +1, a draw for 0, and a loss for -1.

One might ask, why bother?

In the game of Go, this step is like the algorithm, like a Go master might do in his/her head, playing many times of the game until it’s end. The truly genius not only sees the future, but many versions of it.

Backup

Now we have played out many simulations, what can we do with it to inform the best next move?

We use the accumulated rewards of simulations to back up and update the values of nodes in the snowcap.

Note: we don’t update the values of nodes during the roll-out steps! This is done so because we need to focus on the vicinity of the root node (snowcap), based on which we need to make decisions of next move(s). Whereas the values outside of snowcap is not relevant for such decision, nor computationally efficient to store and calculate.

Put it together

Monte Carlo Tree Search: An Introduction (7)

The pseudo-code might look like below:

def run(node, num_rollout):
"one iteration of select->expand->simulation-> backup"
path = select(node)
leaf = path[-1]
expand(leaf)
reward = 0
for i in range(num_rollout):
reward += simulate(leaf)
backup(path, reward)

Note that this is one iteration of MCTS given a specific root node, e.g., a node in the game tree. In practice, we can do as many iterations as possible of these 4 steps, during which in the simulation step multiple rollouts can be done. After computation time or resources are exhausted, we stop the iteration loop, and decide what is the next move to take, then we end up with a new root node, and run the iterations again, so on and so forth…

(Code available at: git repo)

Let’s suppose, we have a binary tree of depth 7, all of its leaf nodes carry a certain reward between 0 and 100, drawn from uniform distribution. Our task is to find the highest reward among the 2⁷ leaf nodes (a small note: leaf nodes used here are of its rudimentary meaning, that is the nodes in the tree without children, unlike in MCTS we use leaf nodes to denote nodes with unexplored child).

Monte Carlo Tree Search: An Introduction (8)

The reader can checkout my git repo, and run find_max_leaf_binary_tree.py. The code handily includes a few parameters:

  • numb_iter to change number of MCTS iterations
  • num_rollout to change number of roll-outs during simulation
  • exploration_weight to control the balance between exploration and exploitation

Enough to play with!

The default run shows results below:

Expected (max) leaf node: node@7_53, value: 99.16984625936269
Expected (min) leaf node: node@7_121, value: 1.2521064278136151
Found optimal (max) leaf node: node@7_104, value: 95.8546572417074

Not too bad, the found optimal node is pretty close to its expected value (note you might not get the exact result due to its probabilistic nature).

Many of AI problems have its roots in search problems. After all, life is about searching. Hope you all find your optimal path :).

Monte Carlo Tree Search: An Introduction (2024)

FAQs

What is the Monte Carlo tree search theory? ›

Monte Carlo Tree Search (MCTS) is a powerful approach to designing game-playing bots or solving sequential decision problems. The method relies on intelligent tree search that balances exploration and exploitation.

What are the 4 steps of the Monte Carlo tree search? ›

The four steps of the Monte Carlo tree search (MCTS) process: selection, expansion, simulation, and backup.

What is the Monte Carlo tree search evaluation? ›

Monte Carlo tree search only searches a few layers deep into the tree and prioritizes which parts of the tree to explore. It then simulates the outcome rather than exhaustively expanding the search space. In doing so, it limits how many evaluations it has to make.

What are the downsides of Monte Carlo tree search? ›

Disadvantages of Monte Carlo Tree Search:
  • As the tree growth becomes rapid after a few iterations, it requires a huge amount of memory.
  • There is a bit of a reliability issue with Monte Carlo Tree Search. ...
  • MCTS algorithm needs a huge number of iterations to be able to effectively decide the most efficient path.
May 23, 2023

What is the Monte Carlo method in simple terms? ›

The Monte Carlo simulation is a mathematical technique that predicts possible outcomes of an uncertain event. Computer programs use this method to analyze past data and predict a range of future outcomes based on a choice of action.

What is Monte Carlo tree search decision-making? ›

Monte Carlo Tree Search (MCTS) is a heuristic algorithm used in AI decision-making processes, particularly in complex situations. It employs intelligent tree search techniques to assist AI in making informed choices. MCTS is widely applied in various domains such as game simulation, robotics, and security.

What are the advantages of Monte Carlo tree search? ›

Advantages of MCTS

Monte Carlo Tree Search (MCTS) is a popular algorithm for decision-making and game-playing tasks. Some of the advantages of using MCTS include the following: Strong performance: MCTS performs well in various game-playing tasks, including chess, go, and poker.

What are Monte Carlo search methods? ›

Monte Carlo methods have been developed into a technique called Monte-Carlo tree search that is useful for searching for the best move in a game. Possible moves are organized in a search tree and many random simulations are used to estimate the long-term potential of each move.

Is Monte Carlo tree search model free? ›

Yes, Monte Carlo tree search is a model-free algorithm. It does not require a model of the environment or problem domain to make decisions. Instead, it relies on random simulations and tree search to estimate the value of each action.

How do you explain Monte Carlo analysis? ›

A Monte Carlo simulation is a way to model the probability of different outcomes in a process that cannot easily be predicted due to the intervention of random variables. It is a technique used to understand the impact of risk and uncertainty.

What is the Monte Carlo tree search also known as? ›

In computer science, Monte Carlo tree search (MCTS) is a heuristic search algorithm for some kinds of decision processes, most notably those employed in software that plays board games.

How accurate is the Monte Carlo method? ›

The accuracy of the Monte Carlo method of assessment simulating distribu- tions in probabilistic risk assessment (PRA) is significantly lower than what is widely believed. Some computer codes for which the claimed accuracy is about 1 percent for several thousand simulations, actually have 20 to 30 percent accuracy.

What are the steps of MCTS? ›

  1. What is MCTS?
  2. Selection. Starting at root node R, recursively select optimal child nodes (explained below) until a leaf node L is reached.
  3. Expansion. ...
  4. Simulation. ...
  5. Backpropagation. ...
  6. Bandits and UCB. ...
  7. Exploitation vs Exploration. ...
  8. MCTS and UCT.

How many simulations are there in Monte Carlo tree search? ›

Save this question. In the mcts algorithm described in Wikipedia, it performs exactly one playout(simulation) in each node selection.

What is the disadvantage of Monte Carlo technique? ›

The Monte Carlo simulation can be used in corporate finance, options pricing, and especially portfolio management and personal finance planning. On the downside, the simulation is limited in that it can't account for bear markets, recessions, or any other kind of financial crisis that might impact potential results.

What is the theory behind the Monte Carlo simulation? ›

Monte Carlo simulation is based on repeatedly sampling the data and calculating outcome values from the model. In each sample, the input factor and model parameters can take on different values. These values are simulated based on the distribution of the input factor and parameter values.

What is the Monte Carlo tree search selection policy? ›

Monte-Carlo Tree Search is a best-first search algorithm that relies on random simulations to estimate position values. MCTS collects the results of these random simulations in a game tree that is incrementally grown in an asymmetric way that favors exploration of the most promising sequences of moves.

Top Articles
Latest Posts
Article information

Author: Terrell Hackett

Last Updated:

Views: 6409

Rating: 4.1 / 5 (72 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Terrell Hackett

Birthday: 1992-03-17

Address: Suite 453 459 Gibson Squares, East Adriane, AK 71925-5692

Phone: +21811810803470

Job: Chief Representative

Hobby: Board games, Rock climbing, Ghost hunting, Origami, Kabaddi, Mushroom hunting, Gaming

Introduction: My name is Terrell Hackett, I am a gleaming, brainy, courageous, helpful, healthy, cooperative, graceful person who loves writing and wants to share my knowledge and understanding with you.