Reputation: 575
XGBoost works fine on both cpu and gpu but as soon as I add scikit's randomizedsearchcv for hyperparamater tuning it fails.
System: Ubuntu 20
Environment: conda virtual env with python 3.7
xgboost install: conda install -c anaconda py-xgboost-gpu
Code:
from sklearn.model_selection import cross_val_score, RandomizedSearchCV, train_test_split
import xgboost as xgb
from scipy.stats import uniform, randint
xgb_model = xgb.XGBRegressor(objective="reg:squarederror")
params = {}
params['eval_metric'] = 'rmse'
params['tree_method'] = 'gpu_hist'
params['colsample_bytree'] = uniform(0.7, 0.3)
params['gamma'] = uniform(0, 0.5)
params['learning_rate'] = uniform(0.03, 0.3)
params['max_depth'] = randint(2,6)
params['n_estimators'] = randint(100, 150)
params['subsample'] = uniform(0.6, 0.4)
search = RandomizedSearchCV(xgb_model, param_distributions=params, random_state=42, n_iter=200, cv=3, verbose=1, return_train_score=True) #n_jobs=8,
search.fit(X_train, y_train)
print(search)
Error:
Fitting 3 folds for each of 200 candidates, totalling 600 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
/home/polabs1/anaconda3/envs/PoEnv_XGB_gpu/lib/python3.7/site-packages/sklearn/model_selection/_validation.py:552: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/polabs1/anaconda3/envs/PoEnv_XGB_gpu/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 531, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/home/polabs1/anaconda3/envs/PoEnv_XGB_gpu/lib/python3.7/site-packages/xgboost/sklearn.py", line 396, in fit
callbacks=callbacks)
File "/home/polabs1/anaconda3/envs/PoEnv_XGB_gpu/lib/python3.7/site-packages/xgboost/training.py", line 216, in train
xgb_model=xgb_model, callbacks=callbacks)
File "/home/polabs1/anaconda3/envs/PoEnv_XGB_gpu/lib/python3.7/site-packages/xgboost/training.py", line 74, in _train_internal
bst.update(dtrain, i, obj)
File "/home/polabs1/anaconda3/envs/PoEnv_XGB_gpu/lib/python3.7/site-packages/xgboost/core.py", line 1109, in update
dtrain.handle))
File "/home/polabs1/anaconda3/envs/PoEnv_XGB_gpu/lib/python3.7/site-packages/xgboost/core.py", line 176, in _check_call
raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: Invalid Input: 's', valid values are: {'approx', 'auto', 'exact', 'gpu_exact', 'gpu_hist', 'hist'}
thanks guys
Upvotes: 1
Views: 1549
Reputation: 2361
The param_distribution
argument needs to be a dictionary of lists / array.
The current code interprets eval_metric
and tree_method
arguments you put as
params['eval_metric'] = ['r', 'm', 's', 'e']
params['tree_method'] = ['g', 'p', 'u', '_', 'h', 'i', 's', 't']
To fix it, you want to replace the relevant lines by
params['eval_metric'] = ['rmse']
params['tree_method'] = ['gpu_hist']
Upvotes: 2