Compute similarity between images using a hash to detect near-duplicates

Question

Let's say I have a massive SQL database indexing image files and the files themselves. Some files could be indexed twice or more, some may have a corrupted copy or a more recent version indexed along with the original file.

Detecting exact duplicates can be done easily by computing the MD5 hash of the files, but is there a similar method that can be used to detect near-duplicates (that have a strong similarity without being exactly the same file), in order to remove them from the database ?

To be clear, I want to avoid at all cost things like computing the euclidian distance for every combination of images in the database, that would just take ages.

Compute similarity between images using a hash to detect near-duplicates

Answers (1)

Related Questions