Reputation: 1521
I saw there are a few places that we can set up seed when we do grid search for tuning hyper parameters, for example, we can set up seed in the following 3 places
Are these 3 redundant, we only need set up in one of them or each of them play different role?
thanks!
Upvotes: 0
Views: 1147
Reputation: 5778
There are two places where you can specify a seed when using the Python API
1) The Estimator, let's take GBM as the example
gbm = H2OGradientBoostingEstimator(nfolds=5, seed=1234)
gbm.train(x=features,y=response,training_frame=train)
Notice how I don't specify a seed within the train
method. If you pass a seed argument to train
it will break.
From the API docs you can see that no seed argument is provided.
train(x=None, y=None, training_frame=None, offset_column=None, fold_column=None, weights_column=None, validation_frame=None, max_runtime_secs=None, ignored_columns=None, model_id=None, verbose=False)
From the documentation here is the definition for an Estimator's seed.
This option specifies the random number generator (RNG) seed for algorithms that are dependent on randomization. When a seed is defined, the algorithm will behave deterministically. The seed is consistent for each H2O instance so that you can create models with the same starting conditions in alternative configurations.
2) the search_criteria
in H2OGridSearch. For the docs:
More about search_criteria: This is a dictionary of control parameters for smarter hyperparameter search. The dictionary can include values for: strategy, max_models, max_runtime_secs, stopping_metric, stopping_tolerance, stopping_rounds and seed. The default value for strategy, “Cartesian”, covers the entire space of hyperparameter combinations. If you want to use cartesian grid search, you can leave the search_criteria argument unspecified. Specify the “RandomDiscrete” strategy to perform a random search of all the combinations of your hyperparameters. RandomDiscrete should be usually combined with at least one early stopping criterion, max_models and/or max_runtime_secs. Some examples below:
While you can pass in a seed parameter to the train
method for grid search without having anything break, the seed parameter there does nothing. If you want to have reproducible grid search runs you need to specify the seed argument in the search_criteria parameter like so
# build grid search with previously made GBM and hyper parameters
grid = H2OGridSearch(model = my_model, hyper_params = hyper_params,
search_criteria = {'strategy': "RandomDiscrete", "max_runtime_secs" : 10, "seed" :1234})
# train using the grid
grid.train(x = predictors, y = response, training_frame = train, validation_frame = valid)
Upvotes: 1