Reputation: 2618
I do know that I am assigning the test and training x and y variables correctly during train_test_split. However, I used a TfidfVectorizer for x and a MultiLabelBinarizer for y, with X and Y ending up with different dimensions. Thus, I get the following error:
ValueError: Found input variables with inconsistent numbers of samples: [1173, 294]
I haven't figured out a way to make the input and target have the same dimensions. Below is my code:
xTrain, xTest, yTrain, yTest = train_test_split(x, y, test_size=0.20)
nb_clf = MultinomialNB()
sgd = SGDClassifier()
lr = LogisticRegression()
mn = MultinomialNB()
xTrain = csr_matrix(xTrain).toarray()
xTest = csr_matrix(xTest).toarray()
yTrain = csr_matrix(yTrain).toarray()
print("xTrain.shape = " + str(xTrain.shape))
print("xTest.shape = " + str(xTest.shape))
print("yTrain.shape = " + str(yTrain.shape))
print("yTest.shape = " + str(yTest.shape))
for classifier in [nb_clf, sgd, lr, mn]:
clf = MultiOutputRegressor(classifier)
clf.fit(xTrain.astype("U"), xTest.astype("U"))
y_pred = clf.predict(yTest)
print("\ny_pred:")
print(y_pred)
Below is the output from the print statements:
xTrain.shape = (1173, 13725)
xTest.shape = (294, 13725)
yTrain.shape = (1173, 28)
yTest.shape = (294, 28)
Upvotes: 0
Views: 2716
Reputation: 11
I also got the same error. The reason for this error is that we used X_train when predicting and y_test when calculating the error. The thing to do is to write X_test instead of X_train. Because there is a mismatch in sizes. You have to make sure you get the steps right. When I wrote this the problem was solved. Found input variables with inconsistent numbers samples
Upvotes: 0
Reputation: 837
Shouldn't you be fitting on xtrain
and ytrain
??
clf.fit(xTrain.astype("U"), yTrain.astype("U"))
Upvotes: 1
Reputation: 579
That's because you have put wrong training and testing data into your model. Correct this line of your code:
clf.fit(xTrain.astype("U"), xTest.astype("U"))
to this:
clf.fit(xTrain.astype("U"), yTrain.astype("U"))
Upvotes: 1