Reputation: 1873
I'm asking about the last Layer of U-Net model for Semantic Segmentation what it should be and why?
As I've found a lot of different architectures part of them are using Sigmoid and others are using Softmax in last layer
Upvotes: 2
Views: 4844
Reputation: 178
There's a good foundational article that goes in depth about sigmoid and softmax functions. Here is their summary:
- If your model’s output classes are NOT mutually exclusive and you can choose many of them at the same time, use a sigmoid function on the network’s raw outputs.
- If your model’s output classes are mutually exclusive and you can only choose one, then use a softmax function on the network’s raw outputs.
The article however specifically gives examples of classification tasks. In segmentation tasks, a pixel can only be one class at a time. (For example, in segmenting items on a beach, a pixel can't be both sand AND water.) This results in the often use of softmax in segmentation models, as the classes are mutually exclusive. In other words, a multi-class classification problem.
Sigmoid deals with multi-label classification problems, allowing for a pixel to share a label (a pixel can be both sand and water, both sky and water, even sky+water+sand+sun+etc.), which doesn't make sense. The exception, however, is if there's only one class, in other words, binary classification (water vs no water). Then you may use sigmoid in segmentation.
Softmax is actually a generalization of a sigmoid function. See this question over on Cross Validated for more info, but this is extra credit.
To finish answering your question, I should briefly speak about loss functions. Depending on your loss function, you may be preferring sigmoid or softmax. (E.g. if your loss function requires logits, softmax is inadequate.)
In summary, using softmax or sigmoid in the last layer depends on the problem you're working on, along with the associated loss function and other intricacies in your pipeline/software. In practice, if you have a multi-class problem, chances are you'll be using softmax. If you have one-class/binary problem, sigmoid or softmax are possibilities.
Upvotes: 5