Reputation: 23
I need to perform parameter optimization on a gbm model on RH2o. I am relatively new to H2o and I think I need to convert ntrees and learn_rate(below) into a H2o vector before performing the below. How do I perform this operation? Thanks!
ntrees <- c(100,200,300,400)
learn_rate <- c(1,0.5,0.1)
for (i in ntrees){
for j in learn_rate{
n = ntrees[i]
l= learn_rate[j]
gbm_model <- h2o.gbm(features, label, training_frame = train, validation_frame = valid, ntrees=ntrees[[i]],max_depth = 5,learn_rate=learn_rate[j])
print(c(ntrees[i],learn_rate[j],h2o.mse(h2o.performance(gbm_model, valid = TRUE))))
}
}
Upvotes: 2
Views: 384
Reputation: 28913
Lauren's answer, to use grids, is the best one here. I'll just quickly point out that what you have written is a usable approach, and one you can fall back on when grids don't do something you need.
Your example didn't include any data (see https://stackoverflow.com/help/mcve) so I couldn't run it, but I corrected the couple of syntax issues I noticed (R's for-in loop directly gives you the value, not the index, and parentheses around the 2nd for loop):
ntrees <- c(100,200,300,400)
learn_rate <- c(1,0.5,0.1)
for (n in ntrees){
for (l in learn_rate){
gbm_model <- h2o.gbm(
features, label, training_frame = train, validation_frame = valid,
ntrees = n,max_depth = 5,learn_rate = l
)
print(c(n,l,h2o.mse(h2o.performance(gbm_model, valid = TRUE))))
}
}
An example of when you'd use nested loops, like this, is when you want to skip certain combinations. E.g. You might decide to only test ntrees of 100 with learn rate of 0.1, which would then look like this:
ntrees <- c(100,200,300,400)
learn_rate <- c(1,0.5,0.1)
for (n in ntrees){
for (l in learn_rate){
if(l == 0.1 && n > 100)next #Skip when n is 200,300,400
gbm_model <- h2o.gbm(
features, label, training_frame = train, validation_frame = valid,
ntrees = n,max_depth = 5,learn_rate = l
)
print(c(n,l,h2o.mse(h2o.performance(gbm_model, valid = TRUE))))
}
}
Upvotes: 0
Reputation: 5778
you can use h2o.grid()
to do your grid search
# specify your hyper parameters
hyper_params = list( ntrees = c(100,200,300,400), learn_rate = c(1,0.5,0.1) )
# then build your grid
grid <- h2o.grid(
## hyper parameters
hyper_params = hyper_params,
## which algorithm to run
algorithm = "gbm",
## identifier for the grid, to later retrieve it
grid_id = "my_grid",
## standard model parameters
x = features,
y = label,
training_frame = train,
validation_frame = valid,
## set a seed for reproducibility
seed = 1234)
you can read more about how h2o.grid() works in the R documentation http://docs.h2o.ai/h2o/latest-stable/h2o-r/h2o_package.pdf
Upvotes: 2