Siladittya
Siladittya

Reputation: 1205

tf.keras.losses.categorical_crossentropy returning wrong value

I have

y_true = 16

and

y_pred = array([1.1868494e-08, 1.8747659e-09, 1.2777099e-11, 3.6140797e-08,
                6.5852622e-11, 2.2888577e-10, 1.4515833e-09, 2.8392664e-09,
                4.7054605e-10, 9.5605066e-11, 9.3647139e-13, 2.6149302e-10,
                2.5338919e-14, 4.8815413e-10, 3.9381631e-14, 2.1434269e-06,
                9.9999785e-01, 3.0857247e-08, 1.3536775e-09, 4.6811921e-10,
                3.0638234e-10, 2.0818169e-09, 2.9950772e-10, 1.0457132e-10,
                3.2959850e-11, 3.4232595e-10, 5.1689473e-12], dtype=float32)

When I use tf.keras.losses.categorical_crossentropy(to_categorical(y_true,num_classes=27),y_pred,from_logits=True)

The loss value I get is 2.3575358.

But if I use the formula for categorical cross entropy to get the loss value

-np.sum(to_categorical(gtp_out_true[0],num_classes=27)*np.log(gtp_pred[0]))

according to the formula enter image description here

I get the value 2.1457695e-06

Now, my question is, why does the function tf.keras.losses.categorical_crossentropy give different value.

The strange thing is that, my model gives 100% accuracy even though the loss is stuck at 2.3575. Below is the image of the plot of accuracy and losses during training.

enter image description here

What formula does Tensorflow use to calculate categorical cross-entropy?

Upvotes: 1

Views: 5730

Answers (2)

Gaslight Deceive Subvert
Gaslight Deceive Subvert

Reputation: 20372

y_pred as a probability vector so you should not use from_logits=True. Set it to False and you get:

>>> print(categorical_crossentropy(to_categorical(16, num_classes = 27),
                                   y_pred, from_logits = False).numpy())
2.264979e-06

The reason it is not equal to the expected 2.1457695e-06 is, I belive, because y_pred[16] is very close to 1.0 and categorical_crossentropy adds some smoothing.

See the answer here for a discussion on logits: What is the meaning of the word logits in TensorFlow?

You can also use the sparse version of the function if each input value can only have one label:

print(sparse_categorical_crossentropy(16, y_pred))

Upvotes: 1

Siladittya
Siladittya

Reputation: 1205

Found where the problem is

I used softmax activation in my last layer

output = Dense(NUM_CLASSES, activation='softmax')(x)

But I used from_logits=True in tf.keras.losses.categorical_crossentropy, which resulted in softmax being applied again on the output of the last layer (which was already softmax(logits)). So, the output argument that I was passing to the loss function was softmax(softmax(logits)).

Hence, the anomaly in the values of loss.

When using softmax as activation in the last layer, we should use from_logits=False

Upvotes: 2

Related Questions