Henry Zhu
Henry Zhu

Reputation: 2618

Python SkLearn: ValueError: Found input variables with inconsistent numbers samples: [1173, 294]

I do know that I am assigning the test and training x and y variables correctly during train_test_split. However, I used a TfidfVectorizer for x and a MultiLabelBinarizer for y, with X and Y ending up with different dimensions. Thus, I get the following error:

ValueError: Found input variables with inconsistent numbers of samples: [1173, 294]

I haven't figured out a way to make the input and target have the same dimensions. Below is my code:

xTrain, xTest, yTrain, yTest = train_test_split(x, y, test_size=0.20)

nb_clf = MultinomialNB()
sgd = SGDClassifier()
lr = LogisticRegression()
mn = MultinomialNB()

xTrain = csr_matrix(xTrain).toarray()
xTest = csr_matrix(xTest).toarray()
yTrain = csr_matrix(yTrain).toarray()

print("xTrain.shape = " + str(xTrain.shape))
print("xTest.shape = " + str(xTest.shape))
print("yTrain.shape = " + str(yTrain.shape))
print("yTest.shape = " + str(yTest.shape))

for classifier in [nb_clf, sgd, lr, mn]:
    clf = MultiOutputRegressor(classifier)
    clf.fit(xTrain.astype("U"), xTest.astype("U"))
    y_pred = clf.predict(yTest)
    print("\ny_pred:")
    print(y_pred)

Below is the output from the print statements:

xTrain.shape = (1173, 13725)
xTest.shape = (294, 13725)
yTrain.shape = (1173, 28)
yTest.shape = (294, 28)

Upvotes: 0

Views: 2716

Answers (3)

Ayse ILKAY
Ayse ILKAY

Reputation: 11

I also got the same error. The reason for this error is that we used X_train when predicting and y_test when calculating the error. The thing to do is to write X_test instead of X_train. Because there is a mismatch in sizes. You have to make sure you get the steps right. When I wrote this the problem was solved. Found input variables with inconsistent numbers samples

Upvotes: 0

Mohsin hasan
Mohsin hasan

Reputation: 837

Shouldn't you be fitting on xtrain and ytrain ??

clf.fit(xTrain.astype("U"), yTrain.astype("U"))

Upvotes: 1

The Mask
The Mask

Reputation: 579

That's because you have put wrong training and testing data into your model. Correct this line of your code:

clf.fit(xTrain.astype("U"), xTest.astype("U"))

to this:

clf.fit(xTrain.astype("U"), yTrain.astype("U"))

Upvotes: 1

Related Questions