Reputation: 976
I have a dataset of images, where I create the histogram of every image and then I want to store (write) them into a file, so that for every new image I use as input, I compare the histogram of this image with the ones I already have in the file and find if they are identical. The code so far is this:
import numpy as np
import cv2
import os.path
import glob
import matplotlib.pyplot as plt
import pickle
index = {}
#output dic
out = {
1: {},
2: {},
3: {},
}
for t in [1]:
#load_files
files = glob.glob(os.path.join("..", "data", "train", "Type_{}".format(t), "*.jpg"))
no_files = len(files)
#iterate and read
for n, file in enumerate(files):
try:
image = cv2.imread(file)
img = cv2.resize(image, None, fx=0.1, fy=0.1, interpolation=cv2.INTER_AREA)
# features : histograms
plt.hist(img.flatten(), 256, [0, 256], color='r')
plt.xlim([0,256])
plt.legend('histogram', loc='upper left')
plt.show()
# index[file] = hist
# write histograms into file
#compare them and find similarity score
# result_dist = compareHist(index[0], index[1], cv2.cv.CV_COMP_CORREL)
print(file, t, "-files left", no_files - n)
except Exception as e:
print(e)
print(file)
Can someone guide me through this? Thanks!
Upvotes: 0
Views: 1241
Reputation: 13723
You could compute the red channel histogram of all the images like this:
import os
import glob
import numpy as np
from skimage import io
root = 'C:\Users\you\imgs' # Change this appropriately
folders = ['Type_1', 'Type_2', 'Type_3']
extension = '*.bmp' # Change if necessary
def compute_red_histograms(root, folders, extension):
X = []
y = []
for n, imtype in enumerate(folders):
filenames = glob.glob(os.path.join(root, imtype, extension))
for fn in filenames:
img = io.imread(fn)
red = img[:, :, 0]
h, _ = np.histogram(red, bins=np.arange(257), normed=True)
X.append(h)
y.append(n)
return np.vstack(X), np.array(y)
X, y = compute_red_histograms(root, folders, extension)
Each image is represented through a 256-dimensional feature vector (the components of the red channel histogram), hence X
is a 2D NumPy array with as many rows as there are images in your dataset and 256 columns. y
is a 1D NumPy array with numeric class labels, i.e. 0
for Type_1
, 1
for Type_2
and 2
for Type_3
.
Next you could split your dataset into train and test like so:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
And finally, you could train a SVM classifier:
from sklearn.svm import SVC
clf = SVC()
clf.fit(X_train, y_train)
By doing so you can make predictions or assess classification accuracy very easily:
In [197]: y_test
Out[197]: array([0, 2, 0, ..., 0, 0, 1])
In [198]: clf.predict(X_test)
Out[198]: array([2, 2, 2, ..., 2, 2, 2])
In [199]: y_test == clf.predict(X_test)
Out[199]: array([False, True, False, ..., False, False, False], dtype=bool)
In [200]: clf.score(X_test, y_test)
Out[200]: 0.3125
Upvotes: 1