joasa
joasa

Reputation: 976

Compare multiple histograms in file OpenCV

I have a dataset of images, where I create the histogram of every image and then I want to store (write) them into a file, so that for every new image I use as input, I compare the histogram of this image with the ones I already have in the file and find if they are identical. The code so far is this:

import numpy as np
import cv2
import os.path
import glob
import matplotlib.pyplot as plt
import pickle

index = {}

#output dic
out = {
    1: {},
    2: {},
    3: {},
}

for t in [1]:

    #load_files
    files = glob.glob(os.path.join("..", "data", "train", "Type_{}".format(t), "*.jpg"))
    no_files = len(files)

    #iterate and read
    for n, file in enumerate(files):
        try:
            image = cv2.imread(file)
            img = cv2.resize(image, None, fx=0.1, fy=0.1, interpolation=cv2.INTER_AREA)

            # features : histograms
            plt.hist(img.flatten(), 256, [0, 256], color='r')
            plt.xlim([0,256])
            plt.legend('histogram', loc='upper left')
            plt.show()
            # index[file] = hist

            # write histograms into file
            #compare them and find similarity score
            # result_dist = compareHist(index[0], index[1], cv2.cv.CV_COMP_CORREL)

            print(file, t, "-files left", no_files - n)

        except Exception as e:
            print(e)
            print(file)

Can someone guide me through this? Thanks!

Upvotes: 0

Views: 1241

Answers (1)

Tonechas
Tonechas

Reputation: 13723

You could compute the red channel histogram of all the images like this:

import os
import glob
import numpy as np
from skimage import io

root = 'C:\Users\you\imgs'  # Change this appropriately
folders = ['Type_1', 'Type_2', 'Type_3']
extension = '*.bmp'  # Change if necessary

def compute_red_histograms(root, folders, extension):
    X = []
    y = []
    for n, imtype in enumerate(folders):
        filenames = glob.glob(os.path.join(root, imtype, extension))    
        for fn in filenames:
            img = io.imread(fn)
            red = img[:, :, 0]
            h, _ = np.histogram(red, bins=np.arange(257), normed=True)
            X.append(h)
            y.append(n)
    return np.vstack(X), np.array(y)

X, y = compute_red_histograms(root, folders, extension)

Each image is represented through a 256-dimensional feature vector (the components of the red channel histogram), hence X is a 2D NumPy array with as many rows as there are images in your dataset and 256 columns. y is a 1D NumPy array with numeric class labels, i.e. 0 for Type_1, 1 for Type_2and 2 for Type_3.

Next you could split your dataset into train and test like so:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)

And finally, you could train a SVM classifier:

from sklearn.svm import SVC

clf = SVC()
clf.fit(X_train, y_train)

By doing so you can make predictions or assess classification accuracy very easily:

In [197]: y_test
Out[197]: array([0, 2, 0, ..., 0, 0, 1])

In [198]: clf.predict(X_test)
Out[198]: array([2, 2, 2, ..., 2, 2, 2])

In [199]: y_test == clf.predict(X_test)
Out[199]: array([False,  True, False, ..., False, False, False], dtype=bool)

In [200]: clf.score(X_test, y_test)
Out[200]: 0.3125

Upvotes: 1

Related Questions