jatin singh
jatin singh

Reputation: 123

code for h2o ensemble implementation in r for regression in r

I have searched for different portals and even in h2o ensemble documentation and all I have got ensemble examples for only classification problem binary in nature but not a single example showing how to implement general stacking or h2o ensembling for a simple regression problem in r

I request anyone to please share working code on how to implement h2o ensemble or stacking only for regression problem in R

OR

simple ensembling only meant for regression in R.

Only want to know how ensembling/stacking is implemented for regression with varying weights.

Upvotes: 1

Views: 451

Answers (2)

Darren Cook
Darren Cook

Reputation: 28913

The stacked ensemble example in my book (Practical Machine Learning with H2O) is a regression (on the building energy data set). :-)

But, if you ever think you've exhausted all the documentation with H2O, try searching the source code on github. Here is their unit test for stacked ensemble regressions:

https://github.com/h2oai/h2o-3/blob/master/h2o-r/tests/testdir_algos/stackedensemble/runit_stackedensemble_gaussian.R

Upvotes: 0

Lauren
Lauren

Reputation: 5778

Here's an example of building a stacked ensemble for a regression problem (predicting age) in R:

library('h2o')
h2o.init()

files3 = "http://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv"
col_types <- c("Numeric","Numeric","Numeric","Enum","Enum","Numeric","Numeric","Numeric","Numeric")
dat <- h2o.importFile(files3,destination_frame = "prostate.hex",col.types = col_types)
ss <- h2o.splitFrame(dat, ratios = 0.8, seed = 1)
train <- ss[[1]]
test <- ss[[2]]

x <- c("CAPSULE","GLEASON","RACE","DPROS","DCAPS","PSA","VOL")
y <- "AGE"
nfolds <- 5


# Train & Cross-validate a GBM
my_gbm <- h2o.gbm(x = x, 
                  y = y, 
                  training_frame = train, 
                  distribution = "gaussian",
                  max_depth = 3,
                  learn_rate = 0.2,
                  nfolds = nfolds, 
                  fold_assignment = "Modulo",
                  keep_cross_validation_predictions = TRUE,
                  seed = 1)

# Train & Cross-validate a RF
my_rf <- h2o.randomForest(x = x,
                          y = y, 
                          training_frame = train, 
                          ntrees = 30, 
                          nfolds = nfolds, 
                          fold_assignment = "Modulo",
                          keep_cross_validation_predictions = TRUE,
                          seed = 1)


# Train & Cross-validate a extremely-randomized RF
my_xrf <- h2o.randomForest(x = x,
                           y = y, 
                           training_frame = train, 
                           ntrees = 50,
                           histogram_type = "Random",
                           nfolds = nfolds, 
                           fold_assignment = "Modulo",
                           keep_cross_validation_predictions = TRUE,
                           seed = 1)

# Train a stacked ensemble using the models above
stack <- h2o.stackedEnsemble(x = x, 
                             y = y, 
                             training_frame = train,
                             validation_frame = test,  #also test that validation_frame is working
                             model_id = "my_ensemble_gaussian", 
                             base_models = list(my_gbm@model_id, my_rf@model_id, my_xrf@model_id))

# predict
pred <- h2o.predict(stack, newdata = test)

Upvotes: 1

Related Questions