Reputation: 51
I am confused about how H2O calculates R^2. I created a dummy dataframe used H2O's RandomForestEstimator:
df = pd.DataFrame({'x':[1,2,3,4,5],'y':[3,9,2,8,1]})
h2o_df=h2o.H2OFrame(df)
rf = H2ORandomForestEstimator()
rf.train('x','y',h2o_df)
rf.r2()
This returns -0.667, which would indicate a pretty poor fit! But I calculated R^2 with the predict method:
y_true = df.y
y_pred = rf.predict(h2o_df).as_data_frame().predict
SSE = sum((y_pred-y_true)**2)
SST = sum((y_true-y_true.mean())**2)
r2 = 1-(SSE/SST)
r2
This returns 0.727, which makes a lot more sense. What is happening internally with the .r2() method?
Upvotes: 0
Views: 255
Reputation: 51
Pretty sure this is a bug. As a workaround, rf.model_performance(h2o_df).r2()
returns the correct value for R^2 (the same as when calculating manually).
Upvotes: 1