python + opencv - how to properly compare images (via histograms)?

Question

I have a bunch of images (from the M.C. Escher collection) i want to organize, so first step i had in mind is to group them up, by comparing them (you know, some have different resolutions/shapes, etc).

i wrote a very brutal script to: * read the files * compute their histograms * compare them

but the quality of the comparison is really low, like there are files matching that are absolutely different

take a look at what i wrote so far:

Preparing the histograms

files_hist = {}

for i, f in enumerate(files):
    try:
        frame = cv2.imread(f)
        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        hist = cv2.calcHist([frame],[0],None,[4096],[0,4096])
        cv2.normalize(hist, hist, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX)

        files_hist[f] = hist
    except Exception as e:
        print('ERROR:', f, e)

Comparing the histograms

pairs = list(itertools.combinations(files_hist.keys(), 2))

for i, (f1, f2) in enumerate(pairs):
    correl = cv2.compareHist(files_hist[f1], files_hist[f2], cv2.HISTCMP_CORREL)

    if correl >= 0.999:
        print('MATCH:', correl, f1, f2)

now, for example i get a match for these 2 files:

m._c._escher_244_(1933).jpg

and

m._c._escher_208_(1931).jpg

and their correlation, using the code above, is 0.9996699595530539 (so their practically the same :( )

what am i doing wrong? how can i improve that code to avoid this false matches?

thanks!

python + opencv - how to properly compare images (via histograms)?

Answers (1)

Related Questions