Reputation: 175
I wrote the following code to learn the score in the machine learning methods. but I get the following error. what would be the reason??
veri = pd.read_csv("deneme2.csv")
veri = veri.drop(['id'], axis=1)
y = veri[['Rating']]
x = veri.drop(['Rating','Genres'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.33)
DTR = DecisionTreeRegressor()
DTR.fit(X_train,y_train)
ytahmin = DTR.predict(x)
DTR.fit(veri[['Reviews','Size','Installs','Type','Price','Content Rating','Category_c']],veri.Rating)
basari_DTR = DTR.score(X_test,y_test)
#print("DecisionTreeRegressor: Yüzde",basari_DTR*100," oranında:" )
a = np.array([159,19000000.0,10000,0,0.0,0,0]).reshape(1, -1)
predict_DTR = DTR.predict(a)
print(f1_score(y_train, y_test, average='macro'))
Error: Found input variables with inconsistent numbers of samples: [6271, 3089]
Upvotes: 1
Views: 3997
Reputation: 60399
There are at least two issues with your code.
The first error you report
print(f1_score(y_train, y_test, average='macro'))
Error: Found input variables with inconsistent numbers of samples: [6271, 3089]
is due to your y_train
and y_test
having different lengths, as already pointed out in the other answer.
But this is not the main issue here, because, even if you change y_train
to y_pred
, as suggested, you get a new error:
print(f1_score(y_pred, y_test, average='macro'))
Error: continuous is not supported
This is simply because you are in a regression setting, while the f1 score is a classification metric and, as such, it does not work with continuous predictions.
In other words, f1 score is inappropriate for your (regression) problem, hence the errror.
Check the list of metrics available in scikit-learn, where you can confirm that f1 score is used only in classification, and pick up another metric suitable for regression problems.
For a more detailed exposition about what happens when choosing inappropriate metrics in scikit-learn, see Accuracy Score ValueError: Can't Handle mix of binary and continuous target
Upvotes: 2
Reputation: 24914
f1_score
needs to take true y
from test and the one you predicted on test set, hence last lines should be:
DTR = DecisionTreeRegressor()
DTR.fit(X_train,y_train)
y_pred = DTR.predict(X_test)
print(f1_score(y_pred, y_test, average='macro'))
You shouldn't call fit
twice and the shape of your predictions has to be of the same length as test, see some sklearn basic tutorials for more info.
Upvotes: 1