Fane Spoitoru
Fane Spoitoru

Reputation: 35

X has 232 features, but StandardScaler is expecting 241 features as input

I want to make a prediction using knn and I have following lines of code:

def knn(trainImages, trainLabels, testImages, testLabels):
    max = 0
    for i in range(len(trainImages)):
        if len(trainImages[i]) > max:
            max = len(trainImages[i])

    for i in range(len(trainImages)):
        aux = np.array(trainImages[i])
        aux.resize(max)
        trainImages[i] = aux

    max = 0
    for i in range(len(testImages)):
        if len(testImages[i]) > max:
            max = len(testImages[i])

    for i in range(len(testImages)):
        aux = np.array(testImages[i])
        aux.resize(max)
        testImages[i] = aux

    scaler = StandardScaler()
    scaler.fit(list(trainImages))

    trainImages = scaler.transform(list(trainImages))
    testImages = scaler.transform(list(testImages))

    classifier = KNeighborsClassifier(n_neighbors=5)
    classifier.fit(trainImages, trainLabels)

    pred = classifier.predict(testImages)

    print(classification_report(testLabels, pred))

I got the error at testImages = scaler.transform(list(testImages)). I understand that its a difference between arrays number. How can I solve it?

Upvotes: 1

Views: 4471

Answers (1)

Kaveh
Kaveh

Reputation: 4960

scaler in scikit-learn expects input shape as (n_samples, n_features). If your second dimension in train and test set is not equal, then not only in sklearn it is incorrect and cause to raise error, but also in theory it does not make sense. n_features dimension of test and train set should be equal, but first dimension can be different, since it show number of samples and you can have any number of samples in train and test sets.

When you execute scaler.transform(test) it expects test have the same feature numbers as where you executed scaler.fit(train). So, all your images should be in the same size.

For example, if you have 100 images, train_images shape should be something like (90,224,224,3) and test_images shape should be like (10,224,224,3) (only first dimension is different).

So, try to resize your images like this:

import cv2
resized_image = cv2.resize(image, (224,224)) #don't include channel dimension

Upvotes: 2

Related Questions