Reputation: 1

how to fix the incorrect dimension of numpy array

Iam working on binary image classification problem using supervised machine learning. I used svm classifier algorithm. First I created a numpy array for normalized color images in a variable X,whose shape is (17500,32,32,3). Then after data splitting, X_train has the shape (14000,32,32,3) and dimension 4 and y_train has the shape (14000,2) and dimension 2.

clf.fit(X_train,y_train)

After running this code I got an value error: Found array of dimension 4 estimator has dimension <=2.

Thanks in advance!

Upvotes: 0

Answers (2)

avr_dude

Reputation: 242

The technique is called Dimensionality Reduction. Mapping data from high dimensional space into lower dimensions. The most commonly used technique is the Principal Component Analysis(PCA). You can learn about them through the following links :

https://towardsdatascience.com/feature-selection-and-dimensionality-reduction-f488d1a035de
https://www.quora.com/What-dimensionality-reduction-methods-would-you-recommend

This link explains reduction with an example which has a dataset similar to yours : https://www.datacamp.com/community/tutorials/principal-component-analysis-in-python

Upvotes: 0

shaivikochar

Reputation: 440

If you are using scikit-learn SVM classification algorithm, it expects 2D arrays of shape (n_samples, n_features) for the training dataset for a SVM fit function.

The dataset you are passing in is a 4D array, therefore you need to reshape the array into a 2D array.

Example:

from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

# To apply a classifier, we need to flatten the image, to
# turn the data in a (samples, feature) matrix, 
# assuming data is numpy array of shape (17500, 32, 32, 3), convert to shape (17500, 3072).
n_samples = len(data)
data_reshape = data.reshape((n_samples, -1))

# Split data into train and test subsets
X_train, X_test, y_train, y_test = train_test_split(data_reshape, labels, 
                                                    test_size=0.2)
clf.fit(X_train,y_train)

Upvotes: 2

how to fix the incorrect dimension of numpy array

Answers (2)

Related Questions