Murat Kılınç
Murat Kılınç

Reputation: 175

Found input variables with inconsistent numbers of samples error

I wrote the following code to learn the score in the machine learning methods. but I get the following error. what would be the reason??

veri = pd.read_csv("deneme2.csv")

veri = veri.drop(['id'], axis=1)

y = veri[['Rating']]
x = veri.drop(['Rating','Genres'], axis=1)


X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.33)


DTR = DecisionTreeRegressor()
DTR.fit(X_train,y_train)
ytahmin = DTR.predict(x)
DTR.fit(veri[['Reviews','Size','Installs','Type','Price','Content Rating','Category_c']],veri.Rating)
basari_DTR = DTR.score(X_test,y_test)
#print("DecisionTreeRegressor: Yüzde",basari_DTR*100," oranında:" )
a = np.array([159,19000000.0,10000,0,0.0,0,0]).reshape(1, -1)
predict_DTR = DTR.predict(a)
print(f1_score(y_train, y_test, average='macro')) 

Error: Found input variables with inconsistent numbers of samples: [6271, 3089]

Upvotes: 1

Views: 3997

Answers (2)

desertnaut
desertnaut

Reputation: 60399

There are at least two issues with your code.

The first error you report

print(f1_score(y_train, y_test, average='macro')) 
Error: Found input variables with inconsistent numbers of samples: [6271, 3089]

is due to your y_train and y_test having different lengths, as already pointed out in the other answer.

But this is not the main issue here, because, even if you change y_train to y_pred, as suggested, you get a new error:

print(f1_score(y_pred, y_test, average='macro')) 
Error: continuous is not supported 

This is simply because you are in a regression setting, while the f1 score is a classification metric and, as such, it does not work with continuous predictions.

In other words, f1 score is inappropriate for your (regression) problem, hence the errror.

Check the list of metrics available in scikit-learn, where you can confirm that f1 score is used only in classification, and pick up another metric suitable for regression problems.

For a more detailed exposition about what happens when choosing inappropriate metrics in scikit-learn, see Accuracy Score ValueError: Can't Handle mix of binary and continuous target

Upvotes: 2

Szymon Maszke
Szymon Maszke

Reputation: 24914

f1_score needs to take true y from test and the one you predicted on test set, hence last lines should be:

DTR = DecisionTreeRegressor()
DTR.fit(X_train,y_train)

y_pred = DTR.predict(X_test)
print(f1_score(y_pred, y_test, average='macro')) 

You shouldn't call fit twice and the shape of your predictions has to be of the same length as test, see some sklearn basic tutorials for more info.

Upvotes: 1

Related Questions