Reputation: 148
I need to have replicated runs that give different results with the same hyperparameters in h2o.gbm function.
Even though I've created a loop that provides double runs for each configuration and the results of this h2o gbm model runs are being extracted by using h2o.performance function; I've just realized that each twin run has exactly same results.
What do you suggest to me for having different results by running two h2o.gbm models with the same hyperparameters?
Things that I've tried:
All these tries failed, and two runs with the same hyperparameters gave exact same results. Besides, I am sharing a sample hyperparameter configuration which I would like to get different results by running it twice.
h2o.gbm(x = x_col_names, y = y,
training_frame = train_h2o,
fold_column = "index_4seasons",
ntrees = 1000,
max_depth = 5,
learn_rate = 0.1,
stopping_rounds = 5,
score_tree_interval = 10,
seed = 1)
Any help and comment would be appreciated.
Upvotes: 0
Views: 48
Reputation: 930
The seed value will change the results slightly. See below demonstrating that MSE
changes when using the example from the docs.
# Import the prostate dataset into H2O:
train_h2o = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv")
# Set the predictors and response; set the factors:
train_h2o["CAPSULE"] = train_h2o["CAPSULE"].asfactor()
x_col_names = ["ID","AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"]
y = "CAPSULE"
# Build and train first model:
pros_gbm1 = H2OGradientBoostingEstimator(
nfolds = 5, ntrees = 1000, max_depth = 5, learn_rate = 0.1,
stopping_rounds = 5, score_tree_interval = 10, seed = 1)
pros_gbm1.train(x = x_col_names, y = y,
training_frame = train_h2o)
# Build and train the second model with only seed number changed:
pros_gbm2 = H2OGradientBoostingEstimator(
nfolds = 5, ntrees = 1000, max_depth = 5, learn_rate = 0.1,
stopping_rounds = 5, score_tree_interval = 10, seed = 123456789)
pros_gbm2.train(x = x_col_names, y = y,
training_frame = train_h2o)
print('Model 1 MSE:', pros_gbm1.mse())
print('Model 2 MSE:', pros_gbm2.mse())
Output
Model 1 MSE: 0.020725291770552916
Model 2 MSE: 0.02189654172905499
If your dataset is giving reproducible results with different seeds and hardware settings, it may be that the it is not large or complex enough for the models to behave stochastically. You can also try changing the folds in the fold_column
to see if that has an affect.
Upvotes: 1