Reputation: 1
Iam working on binary image classification problem using supervised machine learning. I used svm classifier algorithm. First I created a numpy array for normalized color images in a variable X,whose shape is (17500,32,32,3). Then after data splitting, X_train has the shape (14000,32,32,3) and dimension 4 and y_train has the shape (14000,2) and dimension 2.
clf.fit(X_train,y_train)
After running this code I got an value error: Found array of dimension 4 estimator has dimension <=2.
Thanks in advance!
Upvotes: 0
Views: 510
Reputation: 242
The technique is called Dimensionality Reduction. Mapping data from high dimensional space into lower dimensions. The most commonly used technique is the Principal Component Analysis(PCA). You can learn about them through the following links :
https://towardsdatascience.com/feature-selection-and-dimensionality-reduction-f488d1a035de
https://www.quora.com/What-dimensionality-reduction-methods-would-you-recommend
This link explains reduction with an example which has a dataset similar to yours : https://www.datacamp.com/community/tutorials/principal-component-analysis-in-python
Upvotes: 0
Reputation: 440
If you are using scikit-learn SVM classification algorithm, it expects 2D arrays of shape (n_samples, n_features)
for the training dataset for a SVM fit function.
The dataset you are passing in is a 4D array, therefore you need to reshape the array into a 2D array.
Example:
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
# To apply a classifier, we need to flatten the image, to
# turn the data in a (samples, feature) matrix,
# assuming data is numpy array of shape (17500, 32, 32, 3), convert to shape (17500, 3072).
n_samples = len(data)
data_reshape = data.reshape((n_samples, -1))
# Split data into train and test subsets
X_train, X_test, y_train, y_test = train_test_split(data_reshape, labels,
test_size=0.2)
clf.fit(X_train,y_train)
Upvotes: 2