Reputation: 79
i made a neural network with keras in python and cannot really understand what the loss function means.
So here first some general information: i worked with the poker hand dataset with classes 0-9, which i wrote as vectors with the OneHotEncoding. I used the softmax activation in the last layer, so my output tells me for each of the 10 entries in a vector the probability if the sample belongs to a certain class. For example: my real input it (0,1,0,0,0,0,0,0,0,0), which means class 1 (from 0-9 means from no card to royal flush), and class 1 means one pair (if you know poker). With the neural net, it get at the and Outputs like (0.4, 0.2, 0.1, 0.1, 0.2, 0,0,0,0,0), which means that my sample belongs with 40 percent to class 0, with 20 percent to class 1 and so on!
Allright! i used also the binary cross_entropy as loss, the accuracy-metrics and the RMSprop-Optimizer. When i use mode.evaluate() from keras, i got something like 0.16 for the loss and i do not know how to interpret this. Does this mean, that in average, my predictions deviate 0.16 from the true? so if my prediction for class 0 is 0.5, it also could be 0.66 or 0.34? Or how can i interpret it?
Please send help!
Upvotes: 0
Views: 555
Reputation: 5449
First at all, according to your problem definition you have a multi-class problem. Thus, you should use categorical_crossentropy. Binary cross_entropy is for two-class problems or for multi-label classification.
But generally the value of the loss function has a relative impact value. First at all, you have to understand what the cross_entropy is meaning. The formula is:
where
c is the correct classification of observation o and
y is the binary indicator (0 or 1) if class label c is the correct classification for observation o and p is the predicted probability that o is of class c.
For binary cross entropy, M is equal to 2. For categorical cross entropy, M>2.
Therefore, the cross entropy decreases if the predicted probability converges to the actual label:
Now let's take your example, where you have 10 classes and your real input is: (0,1,0,0,0,0,0,0,0,0).
If you have a loss of 0.16, it means that
which means that your model has assigned 0.85 to the correct label.
Therefore, the loss function gives you the log of the correct classification probability. As in keras the loss is computed on whole batches, it is the average of the log of the correct classification probability of the whole data in the specific batch. If you use the evaluate
function, then it is the average of the log of the correct classification probability of the whole data you are evaluating.
Upvotes: 1