user8510613
user8510613

Reputation: 1282

Why the loss function can be apply on different size tensors

For example, I have a net that take tensor [N, 7](N is the samples num) as input and tensor [N, 4] as output, the “4” represents the different classes’ probabilities.

And the training data’s labels are the form of tensor [N], from range 0 to 3(represent the ground-truth class).

Here’s my question, I’ve seen some demos, they directly apply the loss function on the output tensor and label tensor. I wonder why this can work, since they have different size, and there sizes seems don’t fit the “broadcasting semantics”.

Here’s the minimal demo.

import torch
import torch.nn as nn
import torch.optim as optim

if __name__ == '__main__':
    features = torch.randn(2, 7)
    gt = torch.tensor([1, 1])
    model = nn.Sequential(
        nn.Linear(7, 4),
        nn.ReLU(),
        nn.Linear(4, 4)
    )
    optimizer = optim.SGD(model.parameters(), lr=0.005)
    f = nn.CrossEntropyLoss()

    for epoch in range(1000):
        optimizer.zero_grad()
        output = model(features)
        loss = f(output, gt)
        loss.backward()
        optimizer.step()

Upvotes: 2

Views: 1803

Answers (1)

MBT
MBT

Reputation: 24099

In PyTorch the implementation is:

pytorch cross entropy

Link to the Documentation: https://pytorch.org/docs/stable/nn.html#torch.nn.CrossEntropyLoss


So implementing this formula in pytorch you get:

import torch
import torch.nn.functional as F

output = torch.tensor([ 0.1998, -0.2261, -0.0388,  0.1457])
target = torch.LongTensor([1])

# implementing the formula above
print('manual  cross-entropy:', (-output[target] + torch.log(torch.sum(torch.exp(output))))[0])

# calling build in cross entropy function to check the result
print('pytorch cross-entropy:', F.cross_entropy(output.unsqueeze(0), target))

Output:

manual  cross-entropy: tensor(1.6462)
pytorch cross-entropy: tensor(1.6462)

I hope this helps!

Upvotes: 4

Related Questions