tidy
tidy

Reputation: 5087

Why do filters and feature layers have the same number of channels?

Some object detection framework such as SSD (Single Shot MultiBox Detector) and Faster-RCNN have “convolutional filters” for classification and regression. The following is from SSD:

For a feature layer of size m × n with p channels, the basic element for predicting parameters of a potential detection is a 3 × 3 × p small kernel that produces either a score for a category, or a shape offset relative to the default box coordinates. At each of the m × n locations where the kernel is applied, it produces an output value.

My question is: does the numbers of “small kernels” have to be p? How about set a arbitrary number k (which is not same with feature channels)?

Upvotes: 4

Views: 269

Answers (1)

Vijay Mariappan
Vijay Mariappan

Reputation: 17201

enter image description here

In the figure, the part extra Feature layers shows how the small kernel extracts p vector from each of the output location, that predict detections for different aspect ratios and class categories.

For example, from the first convolutional feature map, p is (3x(classes+4)), and for the second one it is (6x(classes+4)). The numbers 3 and 6 indicate the number of anchor boxes defined for those feature maps, and for each of those anchor boxes there are classes + 4 box coordinates output.

So you need to fix p based on the number of anchor boxes you decide for each feature map, the number of classes you want to detect.

My question is: does the numbers of “small kernels” have to be p? How about set a arbitrary number k (which is not same with feature channels)?

The feature channel is the result of convolution of the 3x3xp channel so it will always takes size p which is the output channel size of the kernel. And note 3x3xp is actually 3 x 3 x in_channels x p, for example the first features layer is obtained by convolving 38x38x512 from the VGG with the kernel 3x3x512xp to get 38x38xp

Upvotes: 2

Related Questions