user746461
user746461

Reputation:

from_logits=True but loss is 0

I'm learning tensorflow and want to relate tensorflow implementation with Mathematics.

From my knowledge, mathematical cross entropy requires the sum of its input to be 1. In the following code, y_true is a valid input while y_pred is not a Mathematically valid input:

y_true = [[0, 1]]
y_pred = [[1.0, 20.0]]
print(tf.keras.losses.CategoricalCrossentropy(from_logits=False).call(y_true, y_pred))
print(tf.keras.losses.CategoricalCrossentropy(from_logits=True).call(y_true, y_pred))

Gives:

tf.Tensor([0.04879016], shape=(1,), dtype=float32)
tf.Tensor([0.], shape=(1,), dtype=float32)

Please find the gist here.

This answer says:

if from_logits=False, means the input is a probability

This answer says:

from_logits=True means the input to crossEntropy layer is normal tensor/logits

This answer says:

"Another name for raw_predictions in the above code is logit

from_logits, I guess, means the input is raw_predictions.

Since my input are not probability, I set from_logits=True, but the result I get is 0.

Can anyone explain?

Upvotes: 2

Views: 548

Answers (1)

jkr
jkr

Reputation: 19260

The cross entropy between labels [[0, 1]] and logits [[1, 20]] should be a value very close to 0 (and some outputs might represent it as zero due to floating point imprecision). Represented as probabilities, these logits would be approximately [[0.000000005, 1]]. Notice how close these probabilities are to the labels. The cross entropy should therefore be very low.

As OP points out in their question, from_logits=True should be used when operating on unscaled outputs. Practically speaking, from_logits=True is used if operating on outputs before softmax. Softmax maps unscaled outputs to probabilities. To compute cross entropy of those probabilities, from_logits=False should be used.

Here is an example:

import tensorflow as tf

y_true = tf.convert_to_tensor([[0, 1]], "float32")
y_pred = tf.convert_to_tensor([[1, 20]], "float32")

ce_logits_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
ce_probs_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=False)

print(ce_logits_fn(y_true, y_pred))
# tf.Tensor(0.0, shape=(), dtype=float32)

print(ce_probs_fn(y_true, tf.nn.softmax(y_pred)))
# tf.Tensor(1.1920929e-07, shape=(), dtype=float32)

Try with predictions closer together. In the example above, the value of the correct class is much higher than the incorrect class, so cross entropy will be low.

import tensorflow as tf

y_true = tf.convert_to_tensor([[0, 1]], "float32")
y_pred = tf.convert_to_tensor([[5, 7]], "float32")

ce_logits_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
ce_probs_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=False)

print(ce_logits_fn(y_true, y_pred))
# tf.Tensor(0.12692805, shape=(), dtype=float32)

print(ce_probs_fn(y_true, tf.nn.softmax(y_pred)))
# tf.Tensor(0.126928, shape=(), dtype=float32)

Upvotes: 1

Related Questions