Eilleen
Eilleen

Reputation: 403

ResNet family classification layer activation function

I am using the ResNet18 pre-trained model which will be used for a simple binary image classification task. However, all the tutorials including PyTorch itself use nn.Linear(num_of_features, classes) for the final fully connected layer. What I fail to understand is where is the activation function for that module? Also what if I want to use sigmoid/softmax how do I go about that?

Thanks for your help in advance, I am kinda new to Pytorch

Upvotes: 1

Views: 2285

Answers (3)

mujjiga
mujjiga

Reputation: 16906

No you do not use activation in the last layer if your loss function is CrossEntropyLoss because pytorch CrossEntropyLoss loss combines nn.LogSoftmax() and nn.NLLLoss() in one single class.

They do they do that ?

You actually need logits (output of sigmoid) for loss calculation so it is a correct design to not have it as part of forward pass. More over for predictions you don't need logits because argmax(linear(x)) == argmax(softmax(linear(x)) i.e softmax does not change the ordering but only change the magnitudes (squashing function which converts arbitrary value into [0,1] range, but preserves the partial ordering]

If you want to use activation functions to add some sort of non-linearity you normally do that by using a multi-layer NN and having the activation functions in the last but other layers.

Finally, if you are using other loss function like NLLLoss, PoissonNLLLoss, BCELoss then you have to calculates sigmoid yourself. Again on the same note if you are using BCEWithLogitsLoss you don't need to calculate sigmoid again because this loss combines a Sigmoid layer and the BCELoss in one single class.

check the pytorch docs to see how to use the loss.

Upvotes: 4

abe
abe

Reputation: 987

In the tutorials you would see on the internet, people mostly do multi-class classification, for which they use cross-entropy loss which doesn't require a user defined activation function at the output. It applies the softmax activation itself (actually applying an activation function before the cross-entropy is one of the most common mistakes in PyTorch). However, in your case you have a binary classification problem, for which you need to use binary cross-entropy loss, which doesn't apply any activation function by itself unlike the other one. So you will need to apply sigmoid activation (or any kind of activation that maps the real numbers to the range (0, 1) yourself.

Upvotes: 2

gizzmole
gizzmole

Reputation: 1627

Usually, no ReLU activation function is used in the last layer. The output of the torch.nn.Linear layer is fed to the softmax function of the cross-entropy loss, e.g., by using torch.nn.CrossEntropyLoss. What you may be looking for is the binary-cross-entropy loss torch.nn.BCELoss.

Upvotes: 1

Related Questions