Reputation: 41
It seems like every time I re-run a python script that is used to predict on the same test data (without retraining the model) I receive different prediction results. This occurs even after setting the seed parameter on the model before training.
I train and save the model in a notebook, like below.
model = xgboost.XGBClassifier(n_estimators=100, max_depth=8, n_jobs=-1, eval_metric='auc', seed=42)
model.fit(X_train, y_train)
model.save_model("../models/xgbclassifier_01.txt")
From there, I load this model into another script and make predictions on new input data
clf = xgb.XGBClassifier()
clf.load_model(path)
state_pred1 = clf.predict(X_test)
# load and predict again to show that results are the same
clf2 = xgb.XGBClassifier()
clf2.load_model(path)
state_pred_2 = clf2.predict(X_test)
with the results of state_pred1
equal to state_pred2
.
The problem is that whenever I re-run the test script, I don't retrain the model, but I still receive different prediction values (with state_pred1
and state_pred2
still being equal).
Is there a way to ensure that I receive the same predictions every time I run the script? Or are there random parameters within the .predict()
method of the XGBClassifier model that introduce some stochasticity into predictions every time a script that loads a XGB model is re-run?
Upvotes: 2
Views: 780
Reputation: 41
The issue was resolved by ensuring that feature columns remained the same between training and prediction - the order of the features was accidentally being shuffled during data preparation in the beginning of the prediction script.
Upvotes: 2