Reputation: 1205
I have
y_true = 16
and
y_pred = array([1.1868494e-08, 1.8747659e-09, 1.2777099e-11, 3.6140797e-08,
6.5852622e-11, 2.2888577e-10, 1.4515833e-09, 2.8392664e-09,
4.7054605e-10, 9.5605066e-11, 9.3647139e-13, 2.6149302e-10,
2.5338919e-14, 4.8815413e-10, 3.9381631e-14, 2.1434269e-06,
9.9999785e-01, 3.0857247e-08, 1.3536775e-09, 4.6811921e-10,
3.0638234e-10, 2.0818169e-09, 2.9950772e-10, 1.0457132e-10,
3.2959850e-11, 3.4232595e-10, 5.1689473e-12], dtype=float32)
When I use tf.keras.losses.categorical_crossentropy(to_categorical(y_true,num_classes=27),y_pred,from_logits=True)
The loss value I get is 2.3575358
.
But if I use the formula for categorical cross entropy to get the loss value
-np.sum(to_categorical(gtp_out_true[0],num_classes=27)*np.log(gtp_pred[0]))
I get the value 2.1457695e-06
Now, my question is, why does the function tf.keras.losses.categorical_crossentropy
give different value.
The strange thing is that, my model gives 100% accuracy even though the loss is stuck at 2.3575. Below is the image of the plot of accuracy and losses during training.
What formula does Tensorflow use to calculate categorical cross-entropy?
Upvotes: 1
Views: 5730
Reputation: 20372
y_pred
as a probability vector so you should not use from_logits=True
. Set it to False
and you get:
>>> print(categorical_crossentropy(to_categorical(16, num_classes = 27),
y_pred, from_logits = False).numpy())
2.264979e-06
The reason it is not equal to the expected 2.1457695e-06 is, I belive, because y_pred[16] is very close to 1.0 and categorical_crossentropy
adds some smoothing.
See the answer here for a discussion on logits: What is the meaning of the word logits in TensorFlow?
You can also use the sparse version of the function if each input value can only have one label:
print(sparse_categorical_crossentropy(16, y_pred))
Upvotes: 1
Reputation: 1205
Found where the problem is
I used softmax activation in my last layer
output = Dense(NUM_CLASSES, activation='softmax')(x)
But I used from_logits=True
in tf.keras.losses.categorical_crossentropy
, which resulted in softmax being applied again on the output of the last layer (which was already softmax(logits)
). So, the output
argument that I was passing to the loss function was softmax(softmax(logits))
.
Hence, the anomaly in the values of loss.
When using softmax
as activation in the last layer, we should use from_logits=False
Upvotes: 2