Why is scikit-learn r2 score zero?

Question

I have a problem where I've got a training dataset where all the Y values are 0.75 and my model is predicting scores for each row as a regression - but when calculating r2 it's zero and I can't see why

I found only 1 other similar question (Scikit-learn R2 always zero) but applying the answer given there isn't helping me, so I'm not sure where I'm going wrong.

What I have is this:

df["Score"] = 0.75
Y = df["Score"] 
df_valid = df.drop(["Score"],1)

y_pred = model.predict(df) #model is random forest regressor from sklearn 

prediction = np.array(y_pred)
training = np.array(Y)

print(prediction)
print(training)


[0.77279743 0.18169051 0.81874664 0.75440987 0.67748983 0.56747803
 0.66120282 0.5829188  0.73471978 0.57745964 0.48272321 0.65313173
 0.805028   0.63791055 0.49677642 0.64341235 0.55456506 0.52329214
 0.67690119 0.79450821 0.63378986 0.69522612 0.69802982 0.6719472
 0.67977281 0.29016943 0.56192242 0.16265814 0.57813068 0.72598279
 0.50255597 0.77138968 0.53745061 0.527479   0.67161703 0.64326146
 0.5299367  0.79977403 0.73527391 0.50858258 0.74660319 0.72315073
 0.71879784 0.55134538 0.61812615 0.64722909 0.67055658 0.68687499
 0.73416035 0.4781765  0.74878142 0.5773583 ]
[0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75
 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75
 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75
 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75]

both prediction and training are numpy arrays of the same shape - am I missing something else?

When I try print(r2_score(training, prediction)) it gives me 0.

Alex Serra Marrugat · Accepted Answer

R2 score will be 0 when y_predicted or y_true is always the same value. In your case, you have always the same y_true.

Going deeper to the formula, R2 is calculated:

And SStot is calculated as:

SStot= y_true - ymean

In your case, your y_true - y mean will be always 0, since (0.75-0.75=0). So When calculating R2 you finding problem dividing by 0.

On the other hand, if you have the same value for y predicted, SSres and SStot would be the same, and your R2 would be also 0.

Consult this link for more information of how calculate R2, it is pretty well explained

Why is scikit-learn r2 score zero?

Answers (2)

Related Questions