Sam
Sam

Reputation: 515

Confusion about implementing a convolution layer as fully connected layer

I kind of understand how we convert fully-connected to convolutional layer according cs231n:

FC->CONV conversion. Of these two conversions, the ability to convert an FC layer to a CONV layer is particularly useful in practice. Consider a ConvNet architecture that takes a 224x224x3 image, and then uses a series of CONV layers and POOL layers to reduce the image to an activations volume of size 7x7x512 (in an AlexNet architecture that we’ll see later, this is done by use of 5 pooling layers that downsample the input spatially by a factor of two each time, making the final spatial size 224/2/2/2/2/2 = 7). From there, an AlexNet uses two FC layers of size 4096 and finally the last FC layers with 1000 neurons that compute the class scores. We can convert each of these three FC layers to CONV layers as described above: ...

however, I was reading a paper using fully convolutional regression network to predict density map, in their description of the architecture, they claimed that the middle layer(e.g. the top row, A and B are just two different models) from 12x12x128 to 12x12x512 is fully-connected but implemented as convolution: enter image description here

What I don't understand is, in cs231n, the output of the convolution implementation should be a vector with dimension like 1x1x4096, how can the paper have output dimension like 12x12x512 for their FC as convolution implementation?

Upvotes: 1

Views: 1880

Answers (2)

Vijay Mariappan
Vijay Mariappan

Reputation: 17191

The second case is not a FC. For a convolutional representation of a fully connected layer, the convolutional kernel should have the same shape as the input. In cs231 case, the input is 7x7x512 convolved with the kernel of shape 7x7x512 and there are 4096 such kernels, so we get 1x1x4096 as output. In the second case, it is just normal convolution: 12x12x128 convolved with 3x3x128 kernels (with padding) and there are 512 such kernels giving an output of size 12x12x512.

Upvotes: 1

P-Gn
P-Gn

Reputation: 24581

You are right, this is confusing. The layers they label "FC" really are 1x1 convolution layers. I think there choice of terminology is guided by the fact that these "FC" layers are the smallest spatially.

Or, to put it differently, if you start from a convnet with FC layers, transform it into a pure convolutional net as described then extend its input spatially, your former FC layers would look like the "FC" ones above: they would correspond to the layers with the smallest spatial dimensions.

Upvotes: 0

Related Questions