Reputation: 13
I am trying to implement the Monte Carlo search tree method on a rather complicated game, but there is something I don't understand about it (maybe this applies to other search algorithms, I'm not sure). Sorry if my English isn't clear enough.
My question is: what do the tree nodes represent - possible game states or possible moves the player can make? I'll try to explain why both make no sense to me.
If the nodes are game states, why does choosing the move which may lead to the best one make sense? It is very possible that my opponent will choose a different play that will lead to a very bad game state for me.
If the nodes are possible moves, the same logic applies, just one level down - there is no point in choosing the best child of each move, because there is no grantee that my opponent will play in a way that lets me play that move. Instead, it makes sense to look at children more randomly, but that just contradicts the way this method is supposed to work.
So I guess the common implementation is some sort of mix between the two? I did do some research, and I know that sometimes search algorithms use a deterministic opponent which has a simple strategy, and use it for the search. This seems like it wouldn't work for me, because there is no intuitive strategy for the game I am implementing (this is why I chose the monte carlo method - no evaluation function needed). It feels like there is something really simple I'm missing.
I hope my question is clear enough, and I'll be grateful for any help.
Upvotes: 1
Views: 385
Reputation: 28839
Nodes in the game tree represent states - the edges are the moves that lead from one state to another.
In traditional MiniMax game search, which is what I am familiar with, the best move from a particular position is the move that denies the opponent her opportunity to play the best potential move at the next level, and so on, until the search depth is exhausted.
As an example, in a two-ply search (one full move look-ahead), if, when the initial players plays move X1, player two has the potential response moves Y1, Y2, and Y3 with respective scores (from her perspective) of 1, 2, and 3, then X1's score from the first player's perspective becomes -3. If player 1's X2 move comes back with a score of, say, -4 then X1 is the better initial move (it denies player two the opportunity to achieve the score of 4), but if the X2 derived score is -2 then X1 remains the best move at the top of the tree.
The way I understand it, Monte Carlo tree search means limiting the number of game nodes that are examined to some random sub-set. With Monte Carlo, you don't have to go all the way to the terminating node (for games where one can always be reached), though you can, but it can be used with an evaluation function as well, if desired.
Upvotes: 1