Reputation: 29
I am using GridSearchCV()
and its fit()
method to build a model. I currently have this working, but would like to improve the accuracy of the model by supplying more images to train on. Right now, fit()
takes over an hour to complete with 500 images. Processing time exponentially grows as the number of images doubles. Ultimately, I'd like to train on several thousand images and even include additional categories besides the two in my proof of concept. I have tried several ways to improve performance and can't resolve it. The only thing that reduces processing time is to significantly lower train_size
/test_size
in train_test_split()
but doing this defeats the purpose of a larger data set to train from. I'm a little stumped on this one. Below is the code I'm using for reference. Thank you.
categories = ['Cat', 'Dog']
flat_data_arr = []
target_arr = []
datadir = 'C:\\Users\\Name\\Python\\images'
for i in categories:
path = os.path.join(datadir, i)
for image in os.listdir(path):
image_array = imread(os.path.join(path, image))
image_resized = resize(image_array, (150, 150, 3))
flat_data_arr.append(image_resized.flatten())
target_arr.append(categories.index(i))
flat_data = np.array(flat_data_arr)
target = np.array(target_arr)
df = pd.DataFrame(flat_data)
df['Target'] = target
x = df.iloc[:,:-1]
y = df.iloc[:,-1]
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.75, test_size=0.25, shuffle=True, stratify=y)
param_grid={'C':[0.1,1,10,100],'gamma':[0.0001,0.001,0.1,1],'kernel':['rbf','poly']}
svc=svm.SVC(probability=True)
model=GridSearchCV(svc,param_grid)
model.fit(x_train,y_train) #this takes hours depending on number of images
Upvotes: 1
Views: 591
Reputation: 1539
Try - https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.HalvingGridSearchCV.html
Also...
Probably best to use tensorflow or keras or pytorch for computer vision and with GPUs on top, this will run in mili/seconds... even without GPU you will see significant speed up.
However in the case if you decide to continue you could try the following (basically reducing dimensions & adding features):
import Image from PIL
from PIL import Image
import numpy as np
from skimage.feature import hog
from skimage.color import rgb2grey
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
grey_scaled = rgb2grey(imread(os.path.join(path, image))..
hog_features = hog(grey_scaled, block_norm='L2-Hys', pixels_per_cell=(10,10))
color_features = imread(os.path.join(path, image).flatten()
final_features = np.hstack((color_features,hog_features))
standard_sc = StandardScaler()
matrix_scaled = standard_sc.fit_transform(np.array(final_features_list))
### read up on how to select # of components
### there are methods to help you with that
pca = PCA(n_components=300)
matrix_scaled_pca = pca.fit_transform(matrix_scaled)
Best of luck,
Upvotes: 1