How does H2O select best variables for GLM

Question

I put my predictor variables in the gridsearch below. As far as I understood, this gridsearch selects the best variables that should be used in our model and throws away the others. However, I do not know based on which algorithm/ selection metric it selects the best variables. Can somebody tell me how it selects the variables to keep and the variables to throw away?

The function:

  grid.f <-               h2o.grid(algorithm = "glm",                                     # Setting algorithm type
                                   grid_id = "grid.f",                                    # Id so retrieving information on iterations will be easier later
                                   x = predictors,                                        # Setting predictive features
                                   y = response,                                          # Setting target variable
                                   training_frame = data,                                 # Setting training set
                                   hyper_params = hyper_parameters,                       # Setting apha values for iterations
                                   remove_collinear_columns = T,                          # Parameter to remove collinear columns
                                   lambda_search = T,                                     # Setting parameter to find optimal lambda value
                                   seed = p.seed,                                         # Setting to ensure replicateable results
                                   keep_cross_validation_predictions = F,                 # Setting to save cross validation predictions
                                   compute_p_values = F,                                  # Calculating p-values of the coefficients
                                   family = family,                                       # Distribution type used
                                   standardize = T,                                       # Standardizing continuous variables
                                   nfolds = p.folds,                                      # Number of cross-validations
                                   #max_active_predictors = p.max,                         # Setting for number of features
                                   fold_assignment = "Modulo",                            # Specifying fold assignment type to use for cross validations
                                   link = p.link)                                         # Link function for distribution

How does H2O select best variables for GLM

Answers (1)

Related Questions