Reputation: 1282
I am taking lectures of course CS231 from Stanford university. I am unable to understand the point from RNN, Why Softmax unable to select the highest probability which is 0.84 for character o (in the attached example) instead of 0.13 for character e. Explanation will be highly appreciated.
Upvotes: 0
Views: 216
Reputation: 1
Basically, it is because they used sampling, so they drew a sample using the probability distribution given by softmax, which can technically make you draw any character in the vocabulary if its probability is non-zero. As they said in the video, they got "lucky" and drew a character that was matching the one they were expecting, and did it so the illustration would make sense. If it was not the sampling method, it would have been the argmax probability, and in that case you always pick the character that has the highest probability in the distribution (which is o in the illustration).
Upvotes: 0
Reputation: 544
I have not really watched the lecture, but I think the 'e' at the top is the expected output(and 'l', 'l', 'o' too). The initial weights are not giving good enough results (giving 'o' instead of 'e'). As you train the network, the weights will become more mature and ultimately you will see the change in probabilities and the the first prediction will result in 'e' ultimately
Upvotes: 0