cpaul12
cpaul12

Reputation: 41

How to receive same predictions from loaded XGBClassifier model?

It seems like every time I re-run a python script that is used to predict on the same test data (without retraining the model) I receive different prediction results. This occurs even after setting the seed parameter on the model before training.

I train and save the model in a notebook, like below.

model = xgboost.XGBClassifier(n_estimators=100, max_depth=8, n_jobs=-1, eval_metric='auc', seed=42)
model.fit(X_train, y_train)
model.save_model("../models/xgbclassifier_01.txt")

From there, I load this model into another script and make predictions on new input data

clf = xgb.XGBClassifier()
clf.load_model(path)
state_pred1 = clf.predict(X_test)

# load and predict again to show that results are the same
clf2 = xgb.XGBClassifier()
clf2.load_model(path)
state_pred_2 = clf2.predict(X_test)

with the results of state_pred1 equal to state_pred2.

The problem is that whenever I re-run the test script, I don't retrain the model, but I still receive different prediction values (with state_pred1 and state_pred2 still being equal).

Is there a way to ensure that I receive the same predictions every time I run the script? Or are there random parameters within the .predict() method of the XGBClassifier model that introduce some stochasticity into predictions every time a script that loads a XGB model is re-run?

Upvotes: 2

Views: 780

Answers (1)

cpaul12
cpaul12

Reputation: 41

The issue was resolved by ensuring that feature columns remained the same between training and prediction - the order of the features was accidentally being shuffled during data preparation in the beginning of the prediction script.

Upvotes: 2

Related Questions