Reputation: 442
I put my predictor variables in the gridsearch below. As far as I understood, this gridsearch selects the best variables that should be used in our model and throws away the others. However, I do not know based on which algorithm/ selection metric it selects the best variables. Can somebody tell me how it selects the variables to keep and the variables to throw away?
The function:
grid.f <- h2o.grid(algorithm = "glm", # Setting algorithm type
grid_id = "grid.f", # Id so retrieving information on iterations will be easier later
x = predictors, # Setting predictive features
y = response, # Setting target variable
training_frame = data, # Setting training set
hyper_params = hyper_parameters, # Setting apha values for iterations
remove_collinear_columns = T, # Parameter to remove collinear columns
lambda_search = T, # Setting parameter to find optimal lambda value
seed = p.seed, # Setting to ensure replicateable results
keep_cross_validation_predictions = F, # Setting to save cross validation predictions
compute_p_values = F, # Calculating p-values of the coefficients
family = family, # Distribution type used
standardize = T, # Standardizing continuous variables
nfolds = p.folds, # Number of cross-validations
#max_active_predictors = p.max, # Setting for number of features
fold_assignment = "Modulo", # Specifying fold assignment type to use for cross validations
link = p.link) # Link function for distribution
Upvotes: 0
Views: 1275
Reputation: 3671
Even without grid search, H2O-3's GLM uses L1 regularization (aka "lasso") to figure out which variables it can penalize out of the model.
Elastic net is the blending of L1 (lasso) and L2 (ridge regression), and is controlled by the alpha and lambda parameters.
The GLM booklet is a good reference on the details:
Upvotes: 2