Reputation: 165
I tried to implement tic-tac-toe hello-world MCTS game player but I encountered a problem.
While simulating the game and choosing "the most promising" (exploit/explore) node I only take total wins number into account ("exploit" part) - this causes certain problem, the resulting algorithm is not defensive at all. As a result when choosing between
the worse one is chosen (1; 109) because my uct function greedily counts avg wins instead of "value".
Am I identyfing this problem correctly? Should I switch from "avg wins" to some other value metric that takes all results types into account ?
Any advice is welcome, thanks
Upvotes: 2
Views: 463
Reputation: 8518
Since tic-tac-toe is a zero-sum game (value of a state for one player is always equal to the negated value for the opponent), your score function should also reflect this.
This means that, when computing average scores, you should use values like the following:
In your example, this would lead to the following average scores:
-10/110 = -0.0909...
-108/110 = -0.98181...
So the first option would be viewed as the better option
Upvotes: 3