Reputation: 1285
I'm new to XGBoost and parameter tuning, but I'm hoping you can help me.
I'm following along this tutorial:
https://www.datacamp.com/community/tutorials/xgboost-in-python
and it mentions trying to implement a Grid Search to fine tune the hyperparameters near the cross validation section.
I was able to use GridSearchCV to return a best_estimator set of parameters which looks like this:
XGBRegressor(alpha=5, base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=0.4, gamma=0,
importance_type='gain', learning_rate=0.1, max_delta_step=0,
max_depth=5, min_child_weight=1, missing=None, n_estimators=50,
n_jobs=1, nthread=None, objective='reg:squarederror',
random_state=123, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
seed=None, silent=None, subsample=1, verbosity=1)
My question is, if I wanted to pass some of these now tuned parameters into the next cross validation step, how could I get the appropriate values for the below function from the list above?:
params = {"objective":"reg:squarederror",'colsample_bytree': **THE COLSAMPLE_BYTREE VALUE FROM ABOVE**,'learning_rate': 0.1,
'max_depth': **THE MAX_DEPTH VALUE FROM ABOVE**, 'alpha': **THE ALPHA VALUE FROM ABOVE**}
cv_results = xgb.cv(dtrain=data_dmatrix, params=params, nfold=3,
num_boost_round=50,early_stopping_rounds=10,metrics="rmse", as_pandas=True, seed=123)
...or is this a silly question? Just trying to get my hands on some of these new skills using the boston multivariate housing dataset. If you want to see my code, just let me know but hoping this will be enough? Thanks!
Upvotes: 0
Views: 2924
Reputation: 556
In my opinion, you do not need best_estimator for this task. You could use for example best_params or best_index instruction to gain information about parameters which are your point of interest.
best_params_ : dict
Parameter setting that gave the best results on the hold out data.
For multi-metric evaluation, this is present only if refit is specified.
best_index_ : int
The index (of the cv_results_ arrays) which corresponds to the best candidate parameter setting.
The dict at search.cv_results_['params'][search.best_index_] gives the parameter setting for the best model, that gives the highest mean score (search.best_score_).
For multi-metric evaluation, this is present only if refit is specified.
Then you can treat this params as dictionary and put key/values in proper places of your code, for example
your_best_res = cv_.best_params
'max_depth': your_best_res.max_depth, ...
Upvotes: 1