Reputation: 655
I was reading a decent paper S-DCNet and I fell upon a section (page3,table1,classifier) where a convolution layer has been used on the feature map in order to produce a binary classification output as part of an internal process. Since I am a noob and when someone talks to me about classification I automatically make a synapse relating to FCs combined with softmax, I started wondering ... Is this a possible thing to do? Can indeed a convolutional layer be used to classify a binary outcome? The whole concept triggered my imagination so much that I insist on getting answers...
Honestly, how does this actually work? What is the difference between using a convolution filter instead of a fully connected layer for classification purposes?
Edit (Uncertain answer on how does it work): I asked a colleague and he told me that using a filter of the same shape as the length-width shape of the feature map at the current stage, may lead to a learnable binary output (considering that you also reduce the #channels of the feature map to a single channel). But I still don't understand the motivations behind such a technique ..
Upvotes: 1
Views: 351
Reputation: 27946
Using convolutions as FCs can be done (for example) with filters of spatial size (1,1) and with depth of the same size as the FC input size.
The resulting feature map would be of the same size as the input feature map, but each pixel would be the output of a "FC" layer whose weights are the weights of the shared 1x1 conv filter.
This kind of thing is used mainly for semantic segmentation, meaning classification per pixel. U-net is a good example if memory serves.
Also see this.
Also note that 1x1 convolutions have other uses as well.
paperswithcode probably some of the nets there use this trick.
Upvotes: 1