Reputation: 403
I am using the ResNet18 pre-trained model which will be used for a simple binary image classification task. However, all the tutorials including PyTorch itself use nn.Linear(num_of_features, classes)
for the final fully connected layer. What I fail to understand is where is the activation function for that module? Also what if I want to use sigmoid/softmax how do I go about that?
Thanks for your help in advance, I am kinda new to Pytorch
Upvotes: 1
Views: 2285
Reputation: 16906
No you do not use activation in the last layer if your loss function is CrossEntropyLoss
because pytorch CrossEntropyLoss
loss combines nn.LogSoftmax()
and nn.NLLLoss()
in one single class.
You actually need logits (output of sigmoid) for loss calculation so it is a correct design to not have it as part of forward pass. More over for predictions you don't need logits because argmax(linear(x)) == argmax(softmax(linear(x))
i.e softmax
does not change the ordering but only change the magnitudes (squashing function which converts arbitrary value into [0,1] range, but preserves the partial ordering]
If you want to use activation functions to add some sort of non-linearity you normally do that by using a multi-layer NN and having the activation functions in the last but other layers.
Finally, if you are using other loss function like NLLLoss
, PoissonNLLLoss
, BCELoss
then you have to calculates sigmoid yourself. Again on the same note if you are using BCEWithLogitsLoss
you don't need to calculate sigmoid again because this loss combines a Sigmoid layer and the BCELoss
in one single class.
check the pytorch docs to see how to use the loss.
Upvotes: 4
Reputation: 987
In the tutorials you would see on the internet, people mostly do multi-class classification, for which they use cross-entropy loss which doesn't require a user defined activation function at the output. It applies the softmax activation itself (actually applying an activation function before the cross-entropy is one of the most common mistakes in PyTorch). However, in your case you have a binary classification problem, for which you need to use binary cross-entropy loss, which doesn't apply any activation function by itself unlike the other one. So you will need to apply sigmoid activation (or any kind of activation that maps the real numbers to the range (0, 1) yourself.
Upvotes: 2
Reputation: 1627
Usually, no ReLU activation function is used in the last layer. The output of the torch.nn.Linear
layer is fed to the softmax function of the cross-entropy loss, e.g., by using torch.nn.CrossEntropyLoss
. What you may be looking for is the binary-cross-entropy loss torch.nn.BCELoss
.
Upvotes: 1