marou
marou

Reputation: 115

Why is computing the loss from logits more numerically stable?

In TensorFlow the documentation for SparseCategoricalCrossentropy states that using from_logits=True and therefore excluding the softmax operation in the last model layer is more numerically stable for the loss calculation.

Why is this the case?

Upvotes: 1

Views: 1739

Answers (2)

3UqU57GnaX
3UqU57GnaX

Reputation: 399

A bit late to the party, but I think the numerical stability has something to do with precision of floats and overflows.

Say you want to calculate np.exp(2000), it gives you an overflow error. However, calculating np.log(np.exp(2000)) can be simplified to 2000.

By using logits you can circumvent large numbers in intermediate steps, avoiding overflows and low precision.

Upvotes: 1

olegr
olegr

Reputation: 2019

First of all here I think a good explanation about should you worry about numerical stability or not. Check this answer but in general most likely you should not care about it.

To answer your question "Why is this the case?" let's take a look on source code:

def sparse_categorical_crossentropy(target, output, from_logits=False, axis=-1):
""" ...
"""
...

# Note: tf.nn.sparse_softmax_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
    _epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
    output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
    output = tf.log(output)
...

You could see that if from_logits is False then output value is clipped to epsilon and 1-epsilon. That means that if the value is slightly changing outside of this bounds the result will not react on it.

However in my knowledge it's quite exotic situation when it really matters.

Upvotes: 0

Related Questions