TX Shi
TX Shi

Reputation: 329

How to get the params from a saved XGBoost model

I'm trying to train a XGBoost model using the params below:

xgb_params = {
    'objective': 'binary:logistic',
    'eval_metric': 'auc',
    'lambda': 0.8,
    'alpha': 0.4,
    'max_depth': 10,
    'max_delta_step': 1,
    'verbose': True
}

Since my input data is too big to be fully loaded into the memory, I adapt the incremental training:

xgb_clf = xgb.train(xgb_params, input_data, num_boost_round=rounds_per_batch,
                    xgb_model=model_path)

The code for prediction is

xgb_clf = xgb.XGBClassifier()
booster = xgb.Booster()
booster.load_model(model_path)
xgb_clf._Booster = booster
raw_probas = xgb_clf.predict_proba(x)

The result seemed good. But when I tried to invoke xgb_clf.get_xgb_params(), I got a param dict in which all params were set to default values.

I can guess that the root cause is when I initialized the model, I didn't pass any params in. So the model was initialized using the default values but when it predicted, it used an internal booster that had been fitted using some pre-defined params.

However, I wonder is there any way that, after I assign a pre-trained booster model to a XGBClassifier, I can see the real params that are used to train the booster, but not those which are used to initialize the classifier.

Upvotes: 6

Views: 35703

Answers (3)

skybunk
skybunk

Reputation: 863

If you are training like this -

dtrain = xgb.DMatrix(x_train, label=y_train)
model = xgb.train(model_params, dtrain, model_num_rounds)

Then the model returned is a Booster.

import json
json.loads(model.save_config())

the model.save_config() function lists down model parameters in addition to other configurations.

Upvotes: 2

steadyfish
steadyfish

Reputation: 877

To add to @ytsaig's answer, if you are using early_stopping_rounds argument in clf.fit() method then certain additional parameters are generated but not returned as part of clf.get_xgb_params() method. These can be accessed directly as follows: clf.best_score, clf.best_iteration and clf.best_ntree_limit.

Ref: https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier.fit

Upvotes: 1

ytsaig
ytsaig

Reputation: 3296

You seem to be mixing the sklearn API with the functional API in your code, if you stick to either one you should get the parameters to persist in the pickle. Here's an example using the sklearn API.

import pickle
import numpy as np
import xgboost as xgb
from sklearn.datasets import load_digits


digits = load_digits(2)
y = digits['target']
X = digits['data']

xgb_params = {
    'objective': 'binary:logistic',
    'reg_lambda': 0.8,
    'reg_alpha': 0.4,
    'max_depth': 10,
    'max_delta_step': 1,
}
clf = xgb.XGBClassifier(**xgb_params)
clf.fit(X, y, eval_metric='auc', verbose=True)

pickle.dump(clf, open("xgb_temp.pkl", "wb"))
clf2 = pickle.load(open("xgb_temp.pkl", "rb"))

assert np.allclose(clf.predict(X), clf2.predict(X))
print(clf2.get_xgb_params())

which produces

{'base_score': 0.5,
 'colsample_bylevel': 1,
 'colsample_bytree': 1,
 'gamma': 0,
 'learning_rate': 0.1,
 'max_delta_step': 1,
 'max_depth': 10,
 'min_child_weight': 1,
 'missing': nan,
 'n_estimators': 100,
 'objective': 'binary:logistic',
 'reg_alpha': 0.4,
 'reg_lambda': 0.8,
 'scale_pos_weight': 1,
 'seed': 0,
 'silent': 1,
 'subsample': 1}

Upvotes: 10

Related Questions