Kamil Czarnogorski
Kamil Czarnogorski

Reputation: 165

Monte Carlo Tree Search - "most promising" move function

I tried to implement tic-tac-toe hello-world MCTS game player but I encountered a problem.

While simulating the game and choosing "the most promising" (exploit/explore) node I only take total wins number into account ("exploit" part) - this causes certain problem, the resulting algorithm is not defensive at all. As a result when choosing between

the worse one is chosen (1; 109) because my uct function greedily counts avg wins instead of "value".

Am I identyfing this problem correctly? Should I switch from "avg wins" to some other value metric that takes all results types into account ?

Any advice is welcome, thanks

Upvotes: 2

Views: 463

Answers (1)

Dennis Soemers
Dennis Soemers

Reputation: 8518

Since tic-tac-toe is a zero-sum game (value of a state for one player is always equal to the negated value for the opponent), your score function should also reflect this.

This means that, when computing average scores, you should use values like the following:

  • +1 for every win
  • 0 for every draw
  • -1 for every loss

In your example, this would lead to the following average scores:

  • move that results in (100 draws; 10 loses): avg. score = -10/110 = -0.0909...
  • move that results in (1 wins; 109 loses): avg. score = -108/110 = -0.98181...

So the first option would be viewed as the better option

Upvotes: 3

Related Questions