Clarifications on GridSearchCV in sklearn

I have the following questions regarding GridSearchCV in sklearn. I tried but could not find clear answers. Below is the code patch iam using -

dep = df2['responder_flag']
indep = df2.drop(df2.columns[[0,85]], axis = 1)

X_train, X_test, y_train, y_test = train_test_split(indep, dep,test_size=0.25, random_state = 23)

train = xgb.XGBClassifier(objective='binary:logistic')
param_grid = {'max_depth': [4,5], 'n_estimators': [500], 'learning_rate': [0.02,0.01]}
grid = GridSearchCV(train, param_grid,cv=5, scoring='roc_auc')
grid.fit(X_train, y_train)
  1. Is cross_validation i.e. cv parameter in GridSearchCV equivalent to Kfold or other CV techniques explicitly applied using cross_validation_score and other similar functions when training the data?

  2. Can i use GridsearchCV for just cross validation? say if i do not provide multiple parameter list, will it be equal to a cross validation technique?

  3. Once the grid.fit(X_train, y_train) statement is executed, does that train the model on the best parameters identified and can be used for model prediction directly or will I need to define another estimator with grid.best_params_ then train and use that for prediction?

Apologies if these are answered earlier.

Upvotes: 0

Views: 380

Answers (1)

Saurabh Jain
Saurabh Jain

Reputation: 1712

Here are the answers:

  1. cv parameter is equivalent to k-fold.
    In GridSearchCV, we give a set of of param values that we want the model to take. Lets say we take learning_rate = 0.0001 from [0.0001, 0.001, 0.01, 0.1, 1, 10]. When we specify cv= 5 in gridsearch, it will perform 5-fold cv for 000.1. Similarly, it will also perform 5-fold cv for the remaining values. k in this case is 5.

  2. In a sense, yes. But dont do it because GridSearchCV expects a param list. GridSearchCV is a method for performing hyper-parameter tuning. If you do not specify multiple param list, it defeats the purpose of using GridSearch.

  3. Fitting the model with grid.best_params_ on the training set manually after completing grid.fit(X_train, y_train) is not necessary. GridSearchv has a parameter called refit which will refit the grid.best_esitmator_ to the whole training set automatically if we set refit = True. It is by default set to True. Documentation

Upvotes: 1

Related Questions