Reputation: 185
I know there are already some posts on this site to do with this question but none (as far as I can tell) tell me quite what I need to know.
I am interested in how image search engines (like Google images) run their image-based searching and so far I have found this blog post which tells the user how to program out a fingerprinting function that will find similar images. The algorithm on this site only finds images that are either the same image but different resolution or the same image with a slight change to it. I'm looking for a way to put in an image, let's say an image of a forest, and it will give you other images of forests.
I am a beginner to this so I was hopefully looking for something detailed, not giving you the code to do it, just a guide to get me started. Any help would be appreciated.
Upvotes: 4
Views: 3154
Reputation: 1533
One of the common approach for image retrieval is actually inspired by text retrieval, so I will start by quickly reviewing text retrieval:
q
, the most similar documents of the database are returned, using the inverted index. The similarity between a document and the query q
is often computed using the dot product of the two vectors representing the query and the document. (The tf-idf weighting is often used to build the vectors representing the documents.)Image retrieval, as proposed by Sivic and Zisserman in Video Google: A Text Retrieval Approach to Object Matching in Videos, follows exactly the same approach. The only difference is the first step, where they define what is a "visual word", in order to have bag-of-words representation for images.
They start by extracting local features of the image such as SIFT. Those local features (SIFT) are high dimensional vectors, and so, a clustering algorithm, such as k-means, is applied to obtain k
visual words: the k
cluster centers are the "visual words". Then given an image, the local features (SIFT) are extracted and each one is assigned to the closest "visual word" or cluster center, thus obtaining a bag-of-words representation.
This method was later refined, see for example: Hamming Embedding and Weak Geometric consistency for large-scale image search by Hervé Jégou, Matthijs Douze and Cordelia Schmid.
If you want to learn more on those methods, I strongly advise you to have a look at the material from the Visual Recognition and Machine Learning Summer School, in particular the slides for "instance-level recognition" and "large-scale visual search".
Code along YouTube Video link for this.
Upvotes: 3