Reputation: 2350
This is a very newbie question but I'm trying to wrap my head around cross_entropy loss in Torch so I created the following code:
x = torch.FloatTensor([
[1.,0.,0.]
,[0.,1.,0.]
,[0.,0.,1.]
])
print(x.argmax(dim=1))
y = torch.LongTensor([0,1,2])
loss = torch.nn.functional.cross_entropy(x, y)
print(loss)
which outputs the following:
tensor([0, 1, 2])
tensor(0.5514)
What I don't understand is given my input matches the expected output why is the loss not 0?
Upvotes: 8
Views: 13131
Reputation: 3750
Complete, copy/paste runnable example showing an example categorical cross-entropy loss calculation via:
-paper+pencil+calculator
-NumPy
-PyTorch
Other than minor rounding differences all 3 come out to be the same:
import torch
import torch.nn.functional as F
import numpy as np
def main():
### paper + pencil + calculator calculation #################
"""
predictions before softmax:
columns
(4 categories)
rows 1, 4, 1, 1
(3 samples) 5, 1, 2, 1
1, 2, 5, 1
ground truths (NOT one hot encoded)
1, 0, 2
preds softmax calculation:
(e^1/(e^1+e^4+e^1+e^1)), (e^4/(e^1+e^4+e^1+e^1)), (e^1/(e^1+e^4+e^1+e^1)), (e^1/(e^1+e^4+e^1+e^1))
(e^5/(e^5+e^1+e^2+e^1)), (e^1/(e^5+e^1+e^2+e^1)), (e^2/(e^5+e^1+e^2+e^1)), (e^1/(e^5+e^1+e^2+e^1))
(e^1/(e^1+e^2+e^5+e^1)), (e^2/(e^1+e^2+e^5+e^1)), (e^5/(e^1+e^2+e^5+e^1)), (e^1/(e^1+e^2+e^5+e^1))
preds after softmax:
0.04332, 0.87005, 0.04332, 0.04332
0.92046, 0.01686, 0.04583, 0.01686
0.01686, 0.04583, 0.92046, 0.01686
categorical cross-entropy loss calculation:
(-ln(0.87005) + -ln(0.92046) + -ln(0.92046)) / 3 = 0.10166
Note the loss ends up relatively low because all 3 predictions are correct
"""
### calculation via NumPy ###################################
# predictions from model (just made up example data in this case)
# rows = 3 samples, cols = 4 categories
preds = np.array([[1, 4, 1, 1],
[5, 1, 2, 1],
[1, 2, 5, 1]], dtype=np.float32)
# ground truths, NOT one hot encoded
gndTrs = np.array([1, 0, 2], dtype=np.int64)
preds = softmax(preds)
loss = calcCrossEntropyLoss(preds, gndTrs)
print('\n' + 'NumPy loss = ' + str(loss) + '\n')
### calculation via PyTorch #################################
# predictions from model (just made up example data in this case)
# rows = 3 samples, cols = 4 categories
preds = torch.tensor([[1, 4, 1, 1],
[5, 1, 2, 1],
[1, 2, 5, 1]], dtype=torch.float32)
# ground truths, NOT one hot encoded
gndTrs = torch.tensor([1, 0, 2], dtype=torch.int64)
loss = F.cross_entropy(preds, gndTrs)
print('PyTorch loss = ' + str(loss) + '\n')
# end function
def softmax(x: np.ndarray) -> np.ndarray:
numSamps = x.shape[0]
for i in range(numSamps):
x[i] = np.exp(x[i]) / np.sum(np.exp(x[i]))
# end for
return x
# end function
def calcCrossEntropyLoss(preds: np.ndarray, gndTrs: np.ndarray) -> np.ndarray:
assert len(preds.shape) == 2
assert len(gndTrs.shape) == 1
assert preds.shape[0] == gndTrs.shape[0]
numSamps = preds.shape[0]
mySum = 0.0
for i in range(numSamps):
# Note: in numpy, "log" is actually natural log (ln)
mySum += -1 * np.log(preds[i, gndTrs[i]])
# end for
crossEntLoss = mySum / numSamps
return crossEntLoss
# end function
if __name__ == '__main__':
main()
program output:
NumPy loss = 0.10165966302156448
PyTorch loss = tensor(0.1017)
Upvotes: -1
Reputation: 8699
torch.nn.functional.cross_entropy
function combines log_softmax
(softmax followed by a logarithm) and nll_loss
(negative log likelihood loss) in a single
function, i.e. it is equivalent to F.nll_loss(F.log_softmax(x, 1), y)
.
Code:
x = torch.FloatTensor([[1.,0.,0.],
[0.,1.,0.],
[0.,0.,1.]])
y = torch.LongTensor([0,1,2])
print(torch.nn.functional.cross_entropy(x, y))
print(F.softmax(x, 1).log())
print(F.log_softmax(x, 1))
print(F.nll_loss(F.log_softmax(x, 1), y))
output:
tensor(0.5514)
tensor([[-0.5514, -1.5514, -1.5514],
[-1.5514, -0.5514, -1.5514],
[-1.5514, -1.5514, -0.5514]])
tensor([[-0.5514, -1.5514, -1.5514],
[-1.5514, -0.5514, -1.5514],
[-1.5514, -1.5514, -0.5514]])
tensor(0.5514)
Read more about torch.nn.functional.cross_entropy
loss function from here.
Upvotes: 6
Reputation: 666
That is because the input you give to your cross entropy function is not the probabilities as you did but the logits to be transformed into probabilities with this formula:
probas = np.exp(logits)/np.sum(np.exp(logits), axis=1)
So here the matrix of probabilities pytorch will use in your case is:
[0.5761168847658291, 0.21194155761708547, 0.21194155761708547]
[0.21194155761708547, 0.5761168847658291, 0.21194155761708547]
[0.21194155761708547, 0.21194155761708547, 0.5761168847658291]
Upvotes: 5