Criteria to choose a object detection Method

Question

I'm in the research phase of my project and I'm trying to make an object detector using CNN. I know that in general there's 2 "type" of CNN object detector, Region Proposal based (i.e R-CNN and R-FCN ) and Regression/Classification based method (i.e YOLO and SSD). The problem is I'm not so sure which method should I use. I would like to know what are the usual reasoning to choose a Method over the other. there's a few general criteria such as Speed vs Accuracy. But is there any other commonly used reasoning ?

B200011011 · Accepted Answer

There are two categories for detectors, one stage and two stage. Yolo, SSD, RetinaNet, CenterNet etc. fall in one stage while R-FCN, R-CNN, Faster R-CNN, etc. fall in two stage category.

Direct quote from [1] about advantage two stage detector comprated to one stage,

Compared to one-stage detectors, the two-stage ones have the following advantages: 1) By sampling a sparse set of region proposals, two-stage detectors filter out most of the negative proposals; while one-stage detectors directly face all the regions on the image and have a problem of class imbalance if no specialized design is introduced. 2) Since two-stage detectors only process a small number of proposals, the head of the network (for proposal classification and regression) can be larger than one-stage detectors, so that richer features will be extracted. 3) Two-stage detectors have high-quality features of sampled proposals by use of the RoIAlign [10] operation that extracts the location consistent feature of each proposal; but different region proposals can share the same feature in one-stage detectors and the coarse and spatially implicit representation of proposals may cause severe feature misalignment. 4) Two-stage detectors regress the object location twice (once on each stage) and the bounding boxes are better refined than one-stage methods.

Quote accuracy vs efficiency,

One-stage detectors are more efficient and elegant in design, but currently the two-stage detectors have domination in accuracy.

One stage detectors can be deployed on edge devices such as phones for fast real-time detection. This can save more energy compared to more compute intensive detectors.

In summary, go for two stage detectors if accuracy is more important, otherwise go for one stage for faster detection while maintaining good enough accuracy.