Reputation: 146
Very new to OpenCV, and trying my hand at training a haar classifier that can detect images of dogs from side-on. I have used this tutorial as a guide. The author suggests that a relatively effective classifier can be trained using a surprisingly small number of sample images. As per his directions, I collected 40 positive and 600 negative, then used the script provided to generate many more samples in the form of .vec files. Training took about a week an a half through 20 stages with the following parameters:
<?xml version="1.0"?>
<opencv_storage>
<params>
<stageType>BOOST</stageType>
<featureType>HAAR</featureType>
<height>64</height>
<width>80</width>
<stageParams>
<boostType>GAB</boostType>
<minHitRate>9.9900001287460327e-01</minHitRate>
<maxFalseAlarm>5.0000000000000000e-01</maxFalseAlarm>
<weightTrimRate>9.4999999999999996e-01</weightTrimRate>
<maxDepth>1</maxDepth>
<maxWeakCount>100</maxWeakCount></stageParams>
<featureParams>
<maxCatCount>0</maxCatCount>
<featSize>1</featSize>
<mode>ALL</mode></featureParams></params>
</opencv_storage>
During the last stage, the Neg Count Acceptance Ratio was down to 0.000579 - which I took to mean that 0.0579% of negative samples were being wrongly classified as positive, i.e. having dogs in them when they didn't. In other words, 99.942% of samples were being correctly identified. These seemed like pretty good numbers to me, however when I plugged the classifier .xml file into a face-detection program the results were awful.
This is a picture of the classifier being used to analyse a completely black image (camera of the device sat flat against a bench-top to prevent any light from getting in):
(Picture a black screen with several green rectangle borders randomly positioned, some overlapping. Sadly it seems I don't have the necessary reputation to post the real thing...)
My best guess at fixing the classifier is that I need to retrain with a much larger pool of negative and positive samples.
What I really want to know is this: why are the Acceptance Ratio and the real-world performance of the classifier so different? Have I misunderstood the meaning of the Acceptance Ratio? If my understanding of the Ratio is correct, what kind of number should I expect will give me an effective classifier?
Any help would be greatly appreciated.
Upvotes: 4
Views: 3032
Reputation: 1552
When the test acceptance Ratio is much worse than train acceptance ratio, there are two possibilities:
You can check both possibilities. I recommend you to test other feature extraction methods like HOG and also LBP. To this end you only need to changed featureType to HOG or LBP.
The number of positive and negative samples depends on the diversity of samples. It means that If you have an object with wide changes in its appearance (in test images) you need to increase number of positive samples (>500) to cover all possible appearances (the negative samples are the same).
Do not forget to change input parameters for testing of images (min-neighbor,scale,minSize and maxSize).
Upvotes: 1