Reputation: 115
In TensorFlow the documentation for SparseCategoricalCrossentropy states that using from_logits=True
and therefore excluding the softmax operation in the last model layer is more numerically stable for the loss calculation.
Why is this the case?
Upvotes: 1
Views: 1739
Reputation: 399
A bit late to the party, but I think the numerical stability has something to do with precision of floats and overflows.
Say you want to calculate np.exp(2000)
, it gives you an overflow error. However, calculating np.log(np.exp(2000))
can be simplified to 2000
.
By using logits you can circumvent large numbers in intermediate steps, avoiding overflows and low precision.
Upvotes: 1
Reputation: 2019
First of all here I think a good explanation about should you worry about numerical stability or not. Check this answer but in general most likely you should not care about it.
To answer your question "Why is this the case?" let's take a look on source code:
def sparse_categorical_crossentropy(target, output, from_logits=False, axis=-1):
""" ...
"""
...
# Note: tf.nn.sparse_softmax_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
_epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
output = tf.log(output)
...
You could see that if from_logits
is False
then output
value is clipped to epsilon
and 1-epsilon
.
That means that if the value is slightly changing outside of this bounds the result will not react on it.
However in my knowledge it's quite exotic situation when it really matters.
Upvotes: 0