Reputation: 24099
Cross entropy formula:
But why does the following give loss = 0.7437
instead of loss = 0
(since 1*log(1) = 0
)?
import torch
import torch.nn as nn
from torch.autograd import Variable
output = Variable(torch.FloatTensor([0,0,0,1])).view(1, -1)
target = Variable(torch.LongTensor([3]))
criterion = nn.CrossEntropyLoss()
loss = criterion(output, target)
print(loss) # 0.7437
Upvotes: 68
Views: 100591
Reputation: 24201
The combination of
nn.LogSoftmax
andnn.NLLLoss
is equivalent to usingnn.CrossEntropyLoss
. This terminology is a particularity of PyTorch, as thenn.NLLoss
[sic] computes, in fact, the cross entropy but with log probability predictions as inputs wherenn.CrossEntropyLoss
takes scores (sometimes called logits). Technically,nn.NLLLoss
is the cross entropy between the Dirac distribution, putting all mass on the target, and the predicted distribution given by the log probability inputs.
PyTorch's CrossEntropyLoss
expects unbounded scores (interpretable as logits / log-odds) as input, not probabilities (as the CE is traditionally defined).
Upvotes: 8
Reputation: 1066
In your example you are treating output [0, 0, 0, 1]
as probabilities as required by the mathematical definition of cross entropy. But PyTorch treats them as outputs, that don’t need to sum to 1
, and need to be first converted into probabilities for which it uses the softmax function.
So H(p, q)
becomes:
H(p, softmax(output))
Translating the output [0, 0, 0, 1]
into probabilities:
softmax([0, 0, 0, 1]) = [0.1749, 0.1749, 0.1749, 0.4754]
whence:
-log(0.4754) = 0.7437
Upvotes: 105
Reputation: 950
I would like to add an important note, as this often leads to confusion.
Softmax is not a loss function, nor is it really an activation function. It has a very specific task: It is used for multi-class classification to normalize the scores for the given classes. By doing so we get probabilities for each class that sum up to 1.
Softmax is combined with Cross-Entropy-Loss to calculate the loss of a model.
Unfortunately, because this combination is so common, it is often abbreviated. Some are using the term Softmax-Loss, whereas PyTorch calls it only Cross-Entropy-Loss.
Upvotes: 19
Reputation: 37691
Your understanding is correct but pytorch doesn't compute cross entropy in that way. Pytorch uses the following formula.
loss(x, class) = -log(exp(x[class]) / (\sum_j exp(x[j])))
= -x[class] + log(\sum_j exp(x[j]))
Since, in your scenario, x = [0, 0, 0, 1]
and class = 3
, if you evaluate the above expression, you would get:
loss(x, class) = -1 + log(exp(0) + exp(0) + exp(0) + exp(1))
= 0.7437
Pytorch considers natural logarithm.
Upvotes: 37