Sandro Tosi
Sandro Tosi

Reputation: 71

python + opencv - how to properly compare images (via histograms)?

I have a bunch of images (from the M.C. Escher collection) i want to organize, so first step i had in mind is to group them up, by comparing them (you know, some have different resolutions/shapes, etc).

i wrote a very brutal script to: * read the files * compute their histograms * compare them

but the quality of the comparison is really low, like there are files matching that are absolutely different

take a look at what i wrote so far:

Preparing the histograms

files_hist = {}

for i, f in enumerate(files):
    try:
        frame = cv2.imread(f)
        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        hist = cv2.calcHist([frame],[0],None,[4096],[0,4096])
        cv2.normalize(hist, hist, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX)

        files_hist[f] = hist
    except Exception as e:
        print('ERROR:', f, e)

Comparing the histograms

pairs = list(itertools.combinations(files_hist.keys(), 2))

for i, (f1, f2) in enumerate(pairs):
    correl = cv2.compareHist(files_hist[f1], files_hist[f2], cv2.HISTCMP_CORREL)

    if correl >= 0.999:
        print('MATCH:', correl, f1, f2)

now, for example i get a match for these 2 files:

m._c._escher_244_(1933).jpg m._c._escher_244_(1933).jpg

and

m._c._escher_208_(1931).jpg m._c._escher_208_(1931).jpg

and their correlation, using the code above, is 0.9996699595530539 (so their practically the same :( )

what am i doing wrong? how can i improve that code to avoid this false matches?

thanks!

Upvotes: 3

Views: 3589

Answers (1)

Heitor Boschirolli
Heitor Boschirolli

Reputation: 111

Histograms are not a good way to compare images, in black and white images, for example, if they have the same amount of black pixels, the histograms will be identical, regardless on the pixels distributions in the image (that is why the images you mentioned are classified as almost equal).

There are better ways to quantify the difference between images, this post mentions a good option:

  • Load both images as arrays (scipy.misc.imread) and calculate an element-wise (pixel-by-pixel) difference. Calculate the norm of the difference.

edit:

Answering some questions:

I take the zero norm per-pixel is going to be 0.0-1.0 value, with values close to 0.0 meaning "images are the same", correct?

Values close to 0.0 means the pixels are the same. To compare the images as a whole you need to sum over all pixels. If the summed value is close to 0.0 this means the images are almost the same.

what if the 2 image sizes are different?

that's a good one. To calculate the norm difference the images must have the same size. I see two ways to achieve that:

  • the first would be resizing one of the images to the shape of the other one, the problem is that this can cause distortion in the image.

  • the second would be padding the smaller image with zeros until the sizes match.

obs: if you sum over the pixel-wise norm you will have a value between zero and the number of pixels in the image. This can be confusing if you are comparing multiple images. For example, suppose you are comparing images A and B and both have shape 50x50 (therefore, the images have 2500 pixels); values close to 2500 mean the images are completely different. Now suppose you are comparing images C and D and both have shape 1000x1000, in this case, values like 2500 would mean the images are similar. To overcome this problem you can divide the pixel-wise sum over the number of pixels in the image, this will result in a value between 0.0 and 1.0, 0.0 meaning the images are the same and 1.0 meaning they are completely different.

yeah here's the error i received when comparing 2 images with different size diff = image1 - image2 ValueError: operands could not be broadcast together with shapes (850,534) (663,650)

This happens because the images have different shapes. Resizing or padding can avoid this error (as mentioned above).

Upvotes: 4

Related Questions