Use of base anchor size in Single Shot Multi-box detector

Question

I was digging in the Tensorflow Object Detection API in order to check out the anchor box generations for SSD architecture. In this py file where the anchor boxes are generated on the fly, I am unable to understand the usage of base_anchor_size. In the corresponding paper, there is no mention of such thing. Two questions in short:

What is the use of base_anchor_size parameter? Is it important?
How does this parameter affect the training in the cases where the original input image is square in shape and the case when it isn't square?

netanel-sam · Accepted Answer

In SSD architecture there are scales for anchors which are fixed ahead, e.g. linear values across the range 0.2-0.9. These values are relative to the image size. For example, given 320x320 image, then smallest anchor (with 1:1 ratio) will be 64x64, and largest anchor will be 288x288. However, if you wish to insert to your model a larger image, e.g. 640x640, but without changing the anchor sizes (for example since these are images of far objects, so there's no need for large objects; not leaving the anchor sizes untouched allows you not to fine-tune the model on the new resolution), then you can simply have a base_anchor_size=0.5, meaning the anchor scales would be 0.5*[0.2-0.9] relative to the input image size.

The default value for this parameter is [1.0, 1.0], meaning not having any affect.

The entries correspond to [height, width] relative to the maximal square you can fit in the image, meaning [min(image_height,image_width),min(image_height,image_width)]. So, if for example, your input image is VGA, i.e. 640x480, then the base_anchor_size is taken to be relative to [480,480].

Use of base anchor size in Single Shot Multi-box detector

Answers (1)

Related Questions