Reputation: 4118
I already know "xgboost.XGBRegressor
is a Scikit-Learn Wrapper interface for XGBoost."
But do they have any other difference?
Upvotes: 37
Views: 27498
Reputation: 51
@Danil is suggesting significant differences in speed and @Mohammad correctly points out the necessity to convert data to DMatrix structure. So I have tried to replicate the benchmark in the Kaggle notebook environment.
The results showed no major training/predicting speed difference among xgboost native
and sklearn_wrapper
.
import numpy as np
import xgboost as xgb
xgb.__version__
'1.6.1'
# training data
X = np.random.rand(240000, 348)
y = np.random.rand(240000)
```python
%%time
# convert training data
dtrain = xgb.DMatrix(X, label=y)
CPU times: user 3.61 s, sys: 505 ms, total: 4.12 s\
Wall time: 1.56 s
```python
%%time
# train the model with default parameters
model = xgb.train({'objective':'reg:squarederror'},dtrain,10)
CPU times: user 6min 8s, sys: 700 ms, total: 6min 9s
Wall time: 1min 34s
%%time
# predict with trained model
prediction = model.predict(dtrain)
CPU times: user 818 ms, sys: 1.01 ms, total: 819 ms
Wall time: 209 ms
%%time
model = xgb.XGBRegressor(n_estimators=10)
model.fit(X,y)
CPU times: user 6min 15s, sys: 1.2 s, total: 6min 16s
Wall time: 1min 37s
XGBRegressor(base_score=0.5, booster='gbtree', callbacks=None, colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, early_stopping_rounds=None, enable_categorical=False, eval_metric=None, gamma=0, gpu_id=-1, grow_policy='depthwise', importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_bin=256, max_cat_to_onehot=4, max_delta_step=0, max_depth=6, max_leaves=0, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=10, n_jobs=0, num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0, reg_lambda=1, ...)
%%time
prediction_1 = model.predict(X)
CPU times: user 1.48 s, sys: 1.99 ms, total: 1.48 s
Wall time: 380 ms
Upvotes: 2
Reputation: 79
From my opinion the main difference is the training/prediction speed.
For further reference I will call the xgboost.train
- 'native_implementation' and XGBClassifier.fit
- 'sklearn_wrapper'
I have made some benchmarks on a dataset shape (240000, 348)
Fit/train time:
sklearn_wrapper
time = 89 seconds
native_implementation
time = 7 seconds
Prediction time:
sklearn_wrapper
= 6 seconds
native_implementation
= 3.5 milliseconds
I believe this is reasoned by the fact that sklearn_wrapper
is designed to use the pandas/numpy objects as input where the native_implementation
needs the input data to be converted into a xgboost.DMatrix object.
In addition one can optimise n_estimators using a native_implementation
.
Upvotes: 5
Reputation: 836
@Maxim, as of xgboost 0.90 (or much before), these differences don't exist anymore in that xgboost.XGBClassifier.fit:
callbacks
xgb_model
parameterWhat I find is different is evals_result
, in that it has to be retrieved separately after fit (clf.evals_result()
) and the resulting dict
is different because it can't take advantage of the name of the evals in the watchlist ( watchlist = [(d_train, 'train'), (d_valid, 'valid')]
) .
Upvotes: 14
Reputation: 53758
xgboost.train
is the low-level API to train the model via gradient boosting method.
xgboost.XGBRegressor
and xgboost.XGBClassifier
are the wrappers (Scikit-Learn-like wrappers, as they call it) that prepare the DMatrix
and pass in the corresponding objective function and parameters. In the end, the fit
call simply boils down to:
self._Booster = train(params, dmatrix,
self.n_estimators, evals=evals,
early_stopping_rounds=early_stopping_rounds,
evals_result=evals_result, obj=obj, feval=feval,
verbose_eval=verbose)
This means that everything that can be done with XGBRegressor
and XGBClassifier
is doable via underlying xgboost.train
function. The other way around it's obviously not true, for instance, some useful parameters of xgboost.train
are not supported in XGBModel
API. The list of notable differences includes:
xgboost.train
allows to set the callbacks
applied at end of each iteration.xgboost.train
allows training continuation via xgb_model
parameter.xgboost.train
allows not only minization of the eval function, but maximization as well.Upvotes: 56