Logan Wilson
Logan Wilson

Reputation: 51

H2O giving a different R^2 than calculating manually?

I am confused about how H2O calculates R^2. I created a dummy dataframe used H2O's RandomForestEstimator:

df = pd.DataFrame({'x':[1,2,3,4,5],'y':[3,9,2,8,1]})
h2o_df=h2o.H2OFrame(df)
rf = H2ORandomForestEstimator()
rf.train('x','y',h2o_df)
rf.r2()

This returns -0.667, which would indicate a pretty poor fit! But I calculated R^2 with the predict method:

y_true = df.y
y_pred = rf.predict(h2o_df).as_data_frame().predict
SSE = sum((y_pred-y_true)**2)
SST = sum((y_true-y_true.mean())**2)
r2 = 1-(SSE/SST)
r2

This returns 0.727, which makes a lot more sense. What is happening internally with the .r2() method?

Upvotes: 0

Views: 255

Answers (1)

Logan Wilson
Logan Wilson

Reputation: 51

Pretty sure this is a bug. As a workaround, rf.model_performance(h2o_df).r2() returns the correct value for R^2 (the same as when calculating manually).

Upvotes: 1

Related Questions