Matthew Son
Matthew Son

Reputation: 1415

H2O's RMSE performance report not consistent

I'm wondering why h2o.performance report is different from the standard definition of rmse on the test data. h2o's performance report seems to overstate.

Below is a reprex.


iris_h2o = as.h2o(iris)
parts = h2o.splitFrame(iris_h2o, ratios = c(0.5,0.25), seed = 1)
train = parts[[1]]
valid = parts[[2]]
test = parts[[3]]

x = c('Sepal.Width','Petal.Length','Petal.Width')
y = 'Sepal.Length'
auto_gbm = h2o.automl(x= x,
                      y= y,
                      training_frame = train,
                      validation_frame = valid,
                      nfolds = 0,
                      include_algos = c('GBM'),
                      max_models = 5,
                      seed = 1
                      )
best_gbm = h2o.get_best_model(auto_gbm)
 
h2o.performance(best_gbm, test)

The above performance result is

H2ORegressionMetrics: gbm

MSE:  0.1152907
RMSE:  0.3395449
MAE:  0.2675279
RMSLE:  0.04744378
Mean Residual Deviance :  0.1152907

However, when I generate a prediction on the test dataset and calculate RMSE manually, the value diverges a lot.

rmse = function(y, y_predict){
  N = length(y)
  RMSE = sqrt(sum((y-y_predict)^2,na.rm=T)/N)
  return(RMSE)
}

test['predicted'] = h2o.predict(best_gbm, test)

rmse(test['Sepal.Length'], test['predicted'])

[1] 1.890506

H2O's performance report on RMSE : 0.33

Manual calculation on RMSE : 1.89

which is more than 5 times bigger. Why am I seeing this inconsistency?

H2O cluster version:        3.36.1.4 

Upvotes: 1

Views: 66

Answers (1)

phiver
phiver

Reputation: 23608

You have a mistake in your rmse function. The return of length(y) is not returning what you think it does. You should use nrow to get the number of rows. You can check this with length(test['Sepal.Length']), which will return 1 and not 31 as you expect. Your function should be like this:

rmse = function(y, y_predict){
  N = nrow(y)
  RMSE = sqrt(sum((y-y_predict)^2,na.rm=T)/N)
  return(RMSE)
}

rmse(test['Sepal.Length'], test['predicted'])
[1] 0.3395448

Upvotes: 2

Related Questions