Gal Elias
Gal Elias

Reputation: 11

Some questions about the required 300x300 input of the quantized Mobilenet-SSD V2

I want to retrain quantized Mobilenet-SSD V2 model so i downloaded the unlabeled folder from COCO. This model requires input size of 300x300 but i succeeded retrainig it once on pictures of a different size and it worked (poorly, but worked). Also, the code that uses the retrained model resizes the input from the camera to 500x500 and it works. So my question is, why is it written that the required input is 300x300 if it works with other sizes too? Do I need to resize all the dataset to 300x300 before I label them? I know it does convolution on the input so i don't think the size really matters (fix me if im wrong). As I know, the convolution occoure until we reach the end of the input.

Thanks for helping!

Upvotes: 0

Views: 1774

Answers (1)

Tamir Tapuhi
Tamir Tapuhi

Reputation: 426

If I understand correctly you are using TF Object Detection API. A given model, as mobilenet-v2-ssd, contains 3 main blocks: [prepeocessing (normalizing and resizing] --> [Detector (backbone + detection heads)] --> [Postprocessing(bbox decoding+nms)]

When they talk about required input, it is for the detector.. The checkpoint itself contain the full pipeline, which means that the preprocessing unit will do the work for you - so there is no need to resize it to 300x300 beforehand.

if for some reason you intend to inject the input by yourself directly to the detector you have do the same preprocessing what was done in the training.

BTW: in the config file of the training (https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config) you can see the resize that was defined: image_resizer { fixed_shape_resizer { height: 300 width: 300 } } - the normalization is mobilenet normalization (changing the dynamic range of the input from [0,255] to [-1,1]

Upvotes: 3

Related Questions