Jarym
Jarym

Reputation: 2350

Trying to understand cross_entropy loss in PyTorch

This is a very newbie question but I'm trying to wrap my head around cross_entropy loss in Torch so I created the following code:

x = torch.FloatTensor([
                        [1.,0.,0.]
                       ,[0.,1.,0.]
                       ,[0.,0.,1.]
                       ])

print(x.argmax(dim=1))

y = torch.LongTensor([0,1,2])
loss = torch.nn.functional.cross_entropy(x, y)

print(loss)

which outputs the following:

tensor([0, 1, 2])
tensor(0.5514)

What I don't understand is given my input matches the expected output why is the loss not 0?

Upvotes: 8

Views: 13131

Answers (3)

cdahms
cdahms

Reputation: 3750

Complete, copy/paste runnable example showing an example categorical cross-entropy loss calculation via:

-paper+pencil+calculator
-NumPy
-PyTorch

Other than minor rounding differences all 3 come out to be the same:

import torch
import torch.nn.functional as F

import numpy as np

def main():

    ### paper + pencil + calculator calculation #################

    """
    predictions before softmax:
                  columns
               (4 categories)
        rows     1, 4, 1, 1
    (3 samples)  5, 1, 2, 1
                 1, 2, 5, 1

    ground truths (NOT one hot encoded)
          1, 0, 2

    preds softmax calculation:
    (e^1/(e^1+e^4+e^1+e^1)), (e^4/(e^1+e^4+e^1+e^1)), (e^1/(e^1+e^4+e^1+e^1)), (e^1/(e^1+e^4+e^1+e^1))
    (e^5/(e^5+e^1+e^2+e^1)), (e^1/(e^5+e^1+e^2+e^1)), (e^2/(e^5+e^1+e^2+e^1)), (e^1/(e^5+e^1+e^2+e^1))
    (e^1/(e^1+e^2+e^5+e^1)), (e^2/(e^1+e^2+e^5+e^1)), (e^5/(e^1+e^2+e^5+e^1)), (e^1/(e^1+e^2+e^5+e^1))

    preds after softmax:
    0.04332, 0.87005, 0.04332, 0.04332
    0.92046, 0.01686, 0.04583, 0.01686
    0.01686, 0.04583, 0.92046, 0.01686

    categorical cross-entropy loss calculation:
    (-ln(0.87005) + -ln(0.92046) + -ln(0.92046)) / 3 = 0.10166

    Note the loss ends up relatively low because all 3 predictions are correct
    """


    ### calculation via NumPy ###################################

    # predictions from model (just made up example data in this case)
    # rows = 3 samples, cols = 4 categories
    preds = np.array([[1, 4, 1, 1],
                      [5, 1, 2, 1],
                      [1, 2, 5, 1]], dtype=np.float32)

    # ground truths, NOT one hot encoded
    gndTrs = np.array([1, 0, 2], dtype=np.int64)

    preds = softmax(preds)

    loss = calcCrossEntropyLoss(preds, gndTrs)

    print('\n' + 'NumPy loss = ' + str(loss) + '\n')

    ### calculation via PyTorch #################################

    # predictions from model (just made up example data in this case)
    # rows = 3 samples, cols = 4 categories
    preds = torch.tensor([[1, 4, 1, 1],
                          [5, 1, 2, 1],
                          [1, 2, 5, 1]], dtype=torch.float32)

    # ground truths, NOT one hot encoded
    gndTrs = torch.tensor([1, 0, 2], dtype=torch.int64)

    loss = F.cross_entropy(preds, gndTrs)

    print('PyTorch loss = ' + str(loss) + '\n')
# end function

def softmax(x: np.ndarray) -> np.ndarray:
    numSamps = x.shape[0]

    for i in range(numSamps):
        x[i] = np.exp(x[i]) / np.sum(np.exp(x[i]))
    # end for

    return x
# end function

def calcCrossEntropyLoss(preds: np.ndarray, gndTrs: np.ndarray) -> np.ndarray:
    assert len(preds.shape) == 2
    assert len(gndTrs.shape) == 1
    assert preds.shape[0] == gndTrs.shape[0]

    numSamps = preds.shape[0]

    mySum = 0.0
    for i in range(numSamps):
        # Note: in numpy, "log" is actually natural log (ln)
        mySum += -1 * np.log(preds[i, gndTrs[i]])
    # end for

    crossEntLoss = mySum / numSamps
    return crossEntLoss
# end function

if __name__ == '__main__':
    main()

program output:

NumPy loss = 0.10165966302156448

PyTorch loss = tensor(0.1017)

Upvotes: -1

Anubhav Singh
Anubhav Singh

Reputation: 8699

torch.nn.functional.cross_entropy function combines log_softmax(softmax followed by a logarithm) and nll_loss(negative log likelihood loss) in a single function, i.e. it is equivalent to F.nll_loss(F.log_softmax(x, 1), y).

Code:

x = torch.FloatTensor([[1.,0.,0.],
                       [0.,1.,0.],
                       [0.,0.,1.]])
y = torch.LongTensor([0,1,2])

print(torch.nn.functional.cross_entropy(x, y))

print(F.softmax(x, 1).log())
print(F.log_softmax(x, 1))

print(F.nll_loss(F.log_softmax(x, 1), y))

output:

tensor(0.5514)
tensor([[-0.5514, -1.5514, -1.5514],
        [-1.5514, -0.5514, -1.5514],
        [-1.5514, -1.5514, -0.5514]])
tensor([[-0.5514, -1.5514, -1.5514],
        [-1.5514, -0.5514, -1.5514],
        [-1.5514, -1.5514, -0.5514]])
tensor(0.5514)

Read more about torch.nn.functional.cross_entropy loss function from here.

Upvotes: 6

Robin Nicole
Robin Nicole

Reputation: 666

That is because the input you give to your cross entropy function is not the probabilities as you did but the logits to be transformed into probabilities with this formula:

probas = np.exp(logits)/np.sum(np.exp(logits), axis=1)

So here the matrix of probabilities pytorch will use in your case is:

[0.5761168847658291,  0.21194155761708547,  0.21194155761708547]
[0.21194155761708547, 0.5761168847658291, 0.21194155761708547]
[0.21194155761708547,  0.21194155761708547, 0.5761168847658291]

Upvotes: 5

Related Questions