Reputation:
I'm learning tensorflow and want to relate tensorflow implementation with Mathematics.
From my knowledge, mathematical cross entropy requires the sum of its input to be 1. In the following code, y_true
is a valid input while y_pred
is not a Mathematically valid input:
y_true = [[0, 1]]
y_pred = [[1.0, 20.0]]
print(tf.keras.losses.CategoricalCrossentropy(from_logits=False).call(y_true, y_pred))
print(tf.keras.losses.CategoricalCrossentropy(from_logits=True).call(y_true, y_pred))
Gives:
tf.Tensor([0.04879016], shape=(1,), dtype=float32)
tf.Tensor([0.], shape=(1,), dtype=float32)
Please find the gist here.
This answer says:
if from_logits=False, means the input is a probability
This answer says:
from_logits=True
means the input to crossEntropy layer is normal tensor/logits
This answer says:
"Another name for
raw_predictions
in the above code islogit
from_logits
, I guess, means the input is raw_predictions
.
Since my input are not probability, I set from_logits=True
, but the result I get is 0.
Can anyone explain?
Upvotes: 2
Views: 548
Reputation: 19260
The cross entropy between labels [[0, 1]]
and logits [[1, 20]]
should be a value very close to 0 (and some outputs might represent it as zero due to floating point imprecision). Represented as probabilities, these logits would be approximately [[0.000000005, 1]]
. Notice how close these probabilities are to the labels. The cross entropy should therefore be very low.
As OP points out in their question, from_logits=True
should be used when operating on unscaled outputs. Practically speaking, from_logits=True
is used if operating on outputs before softmax. Softmax maps unscaled outputs to probabilities. To compute cross entropy of those probabilities, from_logits=False
should be used.
Here is an example:
import tensorflow as tf
y_true = tf.convert_to_tensor([[0, 1]], "float32")
y_pred = tf.convert_to_tensor([[1, 20]], "float32")
ce_logits_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
ce_probs_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=False)
print(ce_logits_fn(y_true, y_pred))
# tf.Tensor(0.0, shape=(), dtype=float32)
print(ce_probs_fn(y_true, tf.nn.softmax(y_pred)))
# tf.Tensor(1.1920929e-07, shape=(), dtype=float32)
Try with predictions closer together. In the example above, the value of the correct class is much higher than the incorrect class, so cross entropy will be low.
import tensorflow as tf
y_true = tf.convert_to_tensor([[0, 1]], "float32")
y_pred = tf.convert_to_tensor([[5, 7]], "float32")
ce_logits_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
ce_probs_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=False)
print(ce_logits_fn(y_true, y_pred))
# tf.Tensor(0.12692805, shape=(), dtype=float32)
print(ce_probs_fn(y_true, tf.nn.softmax(y_pred)))
# tf.Tensor(0.126928, shape=(), dtype=float32)
Upvotes: 1