Reputation: 515
I kind of understand how we convert fully-connected to convolutional layer according cs231n:
FC->CONV conversion. Of these two conversions, the ability to convert an FC layer to a CONV layer is particularly useful in practice. Consider a ConvNet architecture that takes a 224x224x3 image, and then uses a series of CONV layers and POOL layers to reduce the image to an activations volume of size 7x7x512 (in an AlexNet architecture that we’ll see later, this is done by use of 5 pooling layers that downsample the input spatially by a factor of two each time, making the final spatial size 224/2/2/2/2/2 = 7). From there, an AlexNet uses two FC layers of size 4096 and finally the last FC layers with 1000 neurons that compute the class scores. We can convert each of these three FC layers to CONV layers as described above: ...
however, I was reading a paper using fully convolutional regression network to predict density map, in their description of the architecture, they claimed that the middle layer(e.g. the top row, A and B are just two different models) from 12x12x128 to 12x12x512 is fully-connected but implemented as convolution:
What I don't understand is, in cs231n, the output of the convolution implementation should be a vector with dimension like 1x1x4096, how can the paper have output dimension like 12x12x512 for their FC as convolution implementation?
Upvotes: 1
Views: 1880
Reputation: 17191
The second case is not a FC
. For a convolutional representation of a fully connected layer, the convolutional kernel should have the same shape as the input. In cs231 case, the input is 7x7x512
convolved with the kernel of shape 7x7x512
and there are 4096
such kernels, so we get 1x1x4096 as output. In the second case, it is just normal convolution: 12x12x128
convolved with 3x3x128
kernels (with padding) and there are 512
such kernels giving an output of size 12x12x512
.
Upvotes: 1
Reputation: 24581
You are right, this is confusing. The layers they label "FC" really are 1x1
convolution layers. I think there choice of terminology is guided by the fact that these "FC" layers are the smallest spatially.
Or, to put it differently, if you start from a convnet with FC layers, transform it into a pure convolutional net as described then extend its input spatially, your former FC layers would look like the "FC" ones above: they would correspond to the layers with the smallest spatial dimensions.
Upvotes: 0