Reputation: 11
Could anybody please clarify how (as I have not found any clear example anywhere) The MCTS algorithm iterates for the second player.
Everything I seem just seems to look like it is playing eg P1 move every time. I understand the steps for one agent but I never find anything showing code where P2 places its counter, which surely must happen when growing the tree.
Essentially I would expect:
for each iter:
select node Player1 expand Player1
select node Player2 expand player 2
rollout backpropogate
next iter
Is this right?? Could anybody please spell out some psuedocode showing that? Either iteratively or recursion i don't mind.
Thanks for any help.
Upvotes: 0
Views: 475
Reputation: 5455
The trick is in backpropagation part, where you update "wins" variable from the point of view of player whose move led into this position.
Notice under UCT function, specially the comments:
#Backpropagate
while node != None: # backpropagate from the expanded node and work back to the root node
node.Update(state.GetResult(node.playerJustMoved)) # state is terminal. Update node with result from POV of node.playerJustMoved
node = node.parentNode
IF you follow the function call, you would realize visit
variable is always updated; wins
however, is not.
Upvotes: 1