tamuhey
tamuhey

Reputation: 3535

Pytorch: Why loss functions are implemented both in nn.modules.loss and nn.functional module?

Many loss functions in Pytorch are implemented both in nn.modules.loss and nn.functional.

For example, the two lines of the below return same results.

import torch.nn as nn
import torch.functional as F
nn.L1Loss()(x,y)
F.l1_loss(x,y)

Why are there two implementations?

  1. Consistency for other parametric loss functions
  2. Instantiation of loss function brings something good
  3. otherwise

Upvotes: 4

Views: 3092

Answers (2)

njulhy
njulhy

Reputation: 55

There is the code of BCEWithLogistsLoss without doc:

class BCEWithLogitsLoss(_Loss):
    def __init__(self, weight: Optional[Tensor] = None, size_average=None, reduce=None, reduction: str = 'mean',
                 pos_weight: Optional[Tensor] = None) -> None:
        super(BCEWithLogitsLoss, self).__init__(size_average, reduce, reduction)
        self.register_buffer('weight', weight)
        self.register_buffer('pos_weight', pos_weight)

    def forward(self, input: Tensor, target: Tensor) -> Tensor:
        return F.binary_cross_entropy_with_logits(input, target,
                                                  self.weight,
                                                  pos_weight=self.pos_weight,
                                                  reduction=self.reduction)

If parameter passing isn't considered, class and function implementation are same exactly. However, use class implementation can keep your code more concise and readable, e.g.
use function

loss_func=binary_cross_entropy_with_logits
def train(model, dataloader, loss_fn, optimizer, weight, size_average, reduce, reduction, pos_weight):
    for x, y in dataloader:
        model.zero_grad()
        y_pred = model(x)
        loss = loss_fn(y_pred, y, weight, size_average, reduce, reduction, pos_weight)
        loss.backward()
        optimizer.step()

use class

loss_func = BCEWithLogitsLoss(weight, size_average, reduce, reduction, pos_weight)
def train(model, dataloader, loss_fn, optimizer):
    for x, y in dataloader:
        model.zero_grad()
        y_pred = model(x)
        loss = loss_fn(y_pred, y)
        loss.backward()
        optimizer.step()

If you have several parameters or different loss functions, the class implementation is better.

Upvotes: 1

Jatentaki
Jatentaki

Reputation: 13103

I think of it as of a partial application situation - it's useful to be able to "bundle" many of the configuration variables with the loss function object. In most cases, your loss function has to take prediction and ground_truth as its arguments. This makes for a fairly uniform basic API of loss functions. However, they differ in details. For instance, not every loss function has a reduction parameter. BCEWithLogitsLoss has weight and pos_weight parameters; PoissonNLLLoss has log_input, eps. It's handy to write a function like

def one_epoch(model, dataset, loss_fn, optimizer):
    for x, y in dataset:
        model.zero_grad()
        y_pred = model(x)
        loss = loss_fn(y_pred, y)
        loss.backward()
        optimizer.step()

which can work with instantiated BCEWithLogitsLoss equally well as with PoissonNLLLoss. But it cannot work with their functional counterparts, because of the bookkeeping necessary. You would instead have to first create

loss_fn_packed = functools.partial(F.binary_cross_entropy_with_logits, weight=my_weight, reduction='sum')

and only then you can use it with one_epoch defined above. But this packing is already provided with the object-oriented loss API, along with some bells and whistles (since losses subclass nn.Module, you can use forward and backward hooks, move stuff between cpu and gpu, etc).

Upvotes: 4

Related Questions