david.pfx
david.pfx

Reputation: 10863

Monte Carlo Tree Search picks bad moves that lose immediately

I have a working implementation of an MCTS. I am testing it on Tic-Tac-Toe for convenience, but the real target game will be rather larger.

As far as I can tell it generates the correct win counts and explores the tree the way it should, and mostly generates good moves. However, there are situations where the chosen move leads to an immediate loss. Likewise it may not choose a move that leads to a forced win.

My analysis of the win counts is that these are moves where it is actually true that on random play the move will win more than it loses, and the line played by a skilled opponent is not given enough weight.

So the question is how to either (a) tune the MCTS to favour short term wins and avoid short term losses, or (b) add features to the search to that forced lines are detected and prioritised.

Someone who has actually written and completed an MCTS might be best able to answer the question. There are other similar questions on SO and SWE but no helpful answers.


To be clear, my implementation of MCTS achieves perfect play on TTT Traffic Lights variant in around 500 iterations, but with 100-200 iterations will sometimes play really bad moves, missing immediate wins or losses. By contrast minimax takes only 81 iterations to avoid bad moves completely (2 ply) but shows general weak play with 729 iterations (3 ply). Anyone with working implementations of these algorithms should see similar results.

MCTS converges to good play, but I'm looking to improve how it performs on sudden death games with low iteration counts. I find Traffic Lights makes a good test bed.

Upvotes: 4

Views: 1546

Answers (1)

Taylor Vance
Taylor Vance

Reputation: 1001

I'm trying to solve this same problem. What if your "find moves" code checks for a win for each legal move? If there is an immediate win, return ONLY that move to MCTS--ignore the other technically legal but very bad moves. In this way, instead of pruning bad branches, those bad branches never get added to the tree in the first place. This will probably slow down computation considerably but might be worth it if it's playing more intelligently in the short term.

Upvotes: 1

Related Questions