Reputation: 61
As you can see in the image, the AlphaGo Zero neural network uses a loss function which uses the MCTS probabilities and value as ground truth labels. I am trying to understand whether the outputs of the neural network are treated as logits (e.g. real-valued) or raw probabilities ([0,1]). In the loss function, it looks like the MCTS probabilities (which I am confident lie in [0,1]) are vector-multiplied by the log of the NN probabilities. This is a negative term in the loss, but what does the magnitude of this term indicate about the similarity of the two vectors? Why does a larger value indicate more similarity?
Upvotes: 1
Views: 746