Found input variables with inconsistent numbers of samples error

Question

I wrote the following code to learn the score in the machine learning methods. but I get the following error. what would be the reason??

veri = pd.read_csv("deneme2.csv")

veri = veri.drop(['id'], axis=1)

y = veri[['Rating']]
x = veri.drop(['Rating','Genres'], axis=1)


X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.33)


DTR = DecisionTreeRegressor()
DTR.fit(X_train,y_train)
ytahmin = DTR.predict(x)
DTR.fit(veri[['Reviews','Size','Installs','Type','Price','Content Rating','Category_c']],veri.Rating)
basari_DTR = DTR.score(X_test,y_test)
#print("DecisionTreeRegressor: Yüzde",basari_DTR*100," oranında:" )
a = np.array([159,19000000.0,10000,0,0.0,0,0]).reshape(1, -1)
predict_DTR = DTR.predict(a)
print(f1_score(y_train, y_test, average='macro'))

Error: Found input variables with inconsistent numbers of samples: [6271, 3089]

desertnaut · Accepted Answer

There are at least two issues with your code.

The first error you report

print(f1_score(y_train, y_test, average='macro')) 
Error: Found input variables with inconsistent numbers of samples: [6271, 3089]

is due to your y_train and y_test having different lengths, as already pointed out in the other answer.

But this is not the main issue here, because, even if you change y_train to y_pred, as suggested, you get a new error:

print(f1_score(y_pred, y_test, average='macro')) 
Error: continuous is not supported

This is simply because you are in a regression setting, while the f1 score is a classification metric and, as such, it does not work with continuous predictions.

In other words, f1 score is inappropriate for your (regression) problem, hence the errror.

Check the list of metrics available in scikit-learn, where you can confirm that f1 score is used only in classification, and pick up another metric suitable for regression problems.

For a more detailed exposition about what happens when choosing inappropriate metrics in scikit-learn, see Accuracy Score ValueError: Can't Handle mix of binary and continuous target

Found input variables with inconsistent numbers of samples error

Answers (2)

Related Questions