Why do filters and feature layers have the same number of channels?

Question

Some object detection framework such as SSD (Single Shot MultiBox Detector) and Faster-RCNN have “convolutional filters” for classification and regression. The following is from SSD：

For a feature layer of size m × n with p channels, the basic element for predicting parameters of a potential detection is a 3 × 3 × p small kernel that produces either a score for a category, or a shape offset relative to the default box coordinates. At each of the m × n locations where the kernel is applied, it produces an output value.

My question is: does the numbers of “small kernels” have to be p? How about set a arbitrary number k (which is not same with feature channels)?

Vijay Mariappan · Accepted Answer

In the figure, the part extra Feature layers shows how the small kernel extracts p vector from each of the output location, that predict detections for different aspect ratios and class categories.

For example, from the first convolutional feature map, p is (3x(classes+4)), and for the second one it is (6x(classes+4)). The numbers 3 and 6 indicate the number of anchor boxes defined for those feature maps, and for each of those anchor boxes there are classes + 4 box coordinates output.

So you need to fix p based on the number of anchor boxes you decide for each feature map, the number of classes you want to detect.

My question is: does the numbers of “small kernels” have to be p? How about set a arbitrary number k (which is not same with feature channels)?

The feature channel is the result of convolution of the 3x3xp channel so it will always takes size p which is the output channel size of the kernel. And note 3x3xp is actually 3 x 3 x in_channels x p, for example the first features layer is obtained by convolving 38x38x512 from the VGG with the kernel 3x3x512xp to get 38x38xp

Why do filters and feature layers have the same number of channels?

Answers (1)

Related Questions