Reputation: 3
I have the following questions regarding GridSearchCV
in sklearn. I tried but could not find clear answers.
Below is the code patch iam using -
dep = df2['responder_flag']
indep = df2.drop(df2.columns[[0,85]], axis = 1)
X_train, X_test, y_train, y_test = train_test_split(indep, dep,test_size=0.25, random_state = 23)
train = xgb.XGBClassifier(objective='binary:logistic')
param_grid = {'max_depth': [4,5], 'n_estimators': [500], 'learning_rate': [0.02,0.01]}
grid = GridSearchCV(train, param_grid,cv=5, scoring='roc_auc')
grid.fit(X_train, y_train)
Is cross_validation
i.e. cv parameter in GridSearchCV
equivalent to Kfold
or other CV techniques explicitly applied using cross_validation_score
and other similar functions when training the data?
Can i use GridsearchCV
for just cross validation?
say if i do not provide multiple parameter list, will it be equal to a cross validation technique?
Once the grid.fit(X_train, y_train)
statement is executed, does that train the model on the best parameters identified and can be used for model prediction directly or will I need to define another estimator with grid.best_params_
then train and use that for prediction?
Apologies if these are answered earlier.
Upvotes: 0
Views: 380
Reputation: 1712
Here are the answers:
cv
parameter is equivalent to k-fold.
In GridSearchCV
, we give a set of of param values that we want the model to take. Lets say we take learning_rate
= 0.0001 from [0.0001, 0.001, 0.01, 0.1, 1, 10]. When we specify cv= 5 in gridsearch
, it will perform 5-fold cv
for 000.1
. Similarly, it will also perform 5-fold cv
for the remaining values. k in this case is 5.
In a sense, yes. But dont do it because GridSearchCV expects a param list.
GridSearchCV
is a method for performing hyper-parameter tuning. If you do not specify multiple param list, it defeats the purpose of using GridSearch.
Fitting the model with grid.best_params_
on the training set manually after completing grid.fit(X_train, y_train)
is not necessary. GridSearchv
has a parameter called refit
which will refit the grid.best_esitmator_
to the whole training set automatically if we set refit = True
. It is by default set to True
. Documentation
Upvotes: 1