Reputation: 724
I have implemented the SIFT algorithm in OpenCV for feature detection and matching using the following steps:
My objective is to classify images into different categories such as shoes, shirts etc. based on their similarity. For example two different heels should be more similar to each other than a heel and a sports shoe or a heel and a t-shirt.
However this algorithm is working well only when my template image is present in the search image (in any scale and orientation). If I compare two different heels, they don't match well and the matches are also random(heel of one image matches to the flat surface of the other image). There are also many false positives when I compare a heel with a sports shoe or a heel with a t-shirt or a heel with the picture of a baby!
I would like to look at a heel and identify it as a heel and return how similar the heel is to different images in my database giving maximum similarity for other heels, then followed by other shoes. It should not produce any similarity with irrelevant objects such as shirts, phones, pens..
I understand that the SIFT algorithm produces a descriptor vector for each keypoint based on the gradient values of pixels around the keypoint and images are matched purely using this attribute. Hence it is highly possible that a keypoint located near the heel of one shoe is matched to a keypoint at the surface of the other shoe. Therefore, what I gather is that this algorithm can be used only to detect exact matches and not to detect similarity between images
Could you please tell me if this algorithm can be used for my objective and if I am doing something wrong or suggest any other approach that I should use.
Upvotes: 3
Views: 3102
Reputation: 461
For classification of similar objects, I certainly would go for cascade classifiers.
Basically, cascade classifiers is a machine learning method where you train your classifier to detect an object in different images. For it to work well, you need to train your classifier with a lot of positive (where your object is) and negative (where your object is not) images. The method was invented by Viola and Jones in 2001.
There is a ready-made implementation in OpenCV for face detection, you will have a bit more explanations on the openCV documentation (sorry, can't post the link, I'm limited to 1 link for the moment ..)
Now, for the caveats :
First, you need a lot of positive and negative images. The more images you have, the better the algorithm will perform. Beware of over-learning : if your training dataset for heels contains, for instance, too many images of a given model it is possible that others will not be detected properly
Training the cascade classifier can be long and difficult. The end-result will depend on how well you choose the parameters for training the classifier. Some info on this can be found on this webpage : http://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html
Upvotes: 3