Reputation: 11439
I'm trying to implement a traffic sign recognizer with OpenCV and SURF method. My problem is that i get random results (sometimes really accurate, sometimes obviously wrong) and i cant undertsand why. Here is how i implemented the comparison :
The contour detection works perfectly well : using a gaussain blur and canny edge i manage to find a contour similar to this one :
Then i extract the image corresponding to this contour and i compare this image to traffic signs template image such as these :
The cvExtractSURF returns 189 descriptors for the contour image. Then i use the naiveNearestNeighbor method to find out similarities between my contour image and each template image.
Here are my results :
6/189 for the first template (which is the one i'm expecting to find)
92/189 for the second template (which is obviously very different in every way to the contour image)
I really dont understand these results…
Here is the list of the steps i perform :
To evaluate the similarity between the 2 images i use the ratio :
number of goog points / total number of descriptors
P.S: For information i followed this tutorial : http://www.emgu.com/wiki/index.php/Traffic_Sign_Detection_in_CSharp
And used the find_obj sample of OpenCV to adapt it in C.
Upvotes: 6
Views: 2624
Reputation: 21602
To evaluate the similarity between the 2 images i use the ratio : number of goog points / total number of descriptors
I think it's a bad metric you need to use metric based on descriptor vectors and you must use spartial information between the points.
This is because SIFT-like features matches just the "same points" but not similar points, maybe you can tweak it by changing matching criteria.Because in opencv matching criteria is get nearest point(by descriptor) and check if there is another descriptor near 0.6 similarity.
the descriptor matching consists in two steps. The first step follows the simple but powerful matching algorithm of David Lowe. More precisely, to see if a descriptor A in the left image matches with some descriptor in the right image or not, we compute first the Euclidean distance d(A, A') between the descriptor A in the left image with all the descriptors A' in right image. If the nearest distance, say d(A, A1'), is smaller than k times the second nearest distance, say d(A, A2'), then A and A1' are considered as matched. We set k=0.6
maybe you can change k, but I think it gives more false positives.
Upvotes: 0
Reputation: 2653
SURF descriptors are fine for comparing richly textured images... I think there isn't enough texture in traffic signs for them.
When extracting descriptors, first "salient points" are located, in your case for example at the corners of the rectangularly shaped marks on both signs (the rectangle and the letter P), then local properties are collected for them. Like, how does a corner-of-a-rectangle look like, from close up, blurred and grayscale.
Then, these descriptors are matched to the corner-of-a-rectangle-s from the letter P. They aren't all that different... (as we're not taking any shape information into account). Maybe the corners of the letter P are a little bit more close to those of the "no entry" sign. Randomly.
Of course, all of this is just a speculation... the only way to figure out is to debug it thoroughly. Try displaying the images with little circles where the descriptors were found (circle size could depend on the scale the point was found at). Or put both of the images into one IplImage and draw lines between the matching descriptors. Something like this:
http://www.flickr.com/photos/22191989@N00/268039276
As for how to fix this... what about using the same shape matching method for the inside that you use for detecting the outside contours for the traffic sign? (For example, you could look for P shaped objects once a sign is found.)
Upvotes: 6