Computer vision detection on small object bad results why?

Question

Currently, for detection (localisation + recognition tasks) we use mainly deep learning algorithm in computer vision. Two types of detector exist :

one stage : SSD, YOLO, retinanet, ...
two stage : RCNN, Fast RCNN and faster RCNN for example

Using these detectors on very small objects (10 pixels for example) is a very challenging tasks and it seems the one stage algorithm are worse than the two stage algorithm. But I do not really understand why it works better on Faster RCNN for example. In fact, the one and two stage detector use both of them the anchor concept, and most of them use the same backbone like VGG16 or resnet50/resnet101. That means the receptive fields is the same. For example, I tried to detect very small object on retinanet and on faster RCNN. On retinanet, small object are not detected contrary to faster rcnn. I do not understand why. What is the explication theoretically ? (same backbone : resnet50)

Ryan · Accepted Answer

I think in general networks like retinaNet are trying to bridge the gap you mention.Usually in one stage networks we will have anchor boxes of varying scales in the feature maps produced by the Backbone net, These feature maps are produced by heavily down sampling the input image, A lot of information about small object might be lost while performing this operation.While this is the case with one stage detectors, In two stage detectors because of flexibility of the RPN network, The RPN network may still propose regions which are small and this may help it to perform slightly better than its one stage counterparts.

I don't think you should be very surprised that both of these might use the same backbone, After the conv features are extracted both networks use different methods to perform detection.

Hope this helps, Let me know if i wasn't clear enough,or you have questions.

Computer vision detection on small object bad results why?

Answers (1)

Related Questions