Reputation: 9194
I am probably looking right over it in the documentation, but I wanted to know if there is a way with XGBoost to generate both the prediction and probability for the results? In my case, I am trying to predict a multi-class classifier. it would be great if I could return Medium - 88%.
parameters
params = {
'max_depth': 3,
'objective': 'multi:softmax', # error evaluation for multiclass training
'num_class': 3,
'n_gpus': 0
}
prediction
pred = model.predict(D_test)
results
array([2., 2., 1., ..., 1., 2., 2.], dtype=float32)
User friendly (label encoder)
pred_int = pred.astype(int)
label_encoder.inverse_transform(pred_int[:5])
array(['Medium', 'Medium', 'Low', 'Low', 'Medium'], dtype=object)
EDIT: @Reveille suggested predict_proba. I am not instantiating XGBClassifer(). Should I be? How would I modify my pipeline to use that, if so?
params = {
'max_depth': 3,
'objective': 'multi:softmax', # error evaluation for multiclass training
'num_class': 3,
'n_gpus': 0
}
steps = 20 # The number of training iterations
model = xgb.train(params, D_train, steps)
Upvotes: 19
Views: 52746
Reputation: 4629
You can try pred_p = model.predict_proba(D_test)
An example I had around (not multi-class though):
import xgboost as xgb
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
X, y = make_moons(noise=0.3, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
xgb_clf = xgb.XGBClassifier()
xgb_clf = xgb_clf.fit(X_train, y_train)
print(xgb_clf.predict(X_test))
print(xgb_clf.predict_proba(X_test))
[1 1 1 0 1 0 1 0 0 1]
[[0.0394336 0.9605664 ]
[0.03201818 0.9679818 ]
[0.1275925 0.8724075 ]
[0.94218 0.05782 ]
[0.01464975 0.98535025]
[0.966953 0.03304701]
[0.01640552 0.9835945 ]
[0.9297296 0.07027044]
[0.9580196 0.0419804 ]
[0.02849442 0.9715056 ]]
Note as mentioned in the comments by @scarpacci (ref):
predict_proba() method only exists for the scikit-learn interface
Upvotes: 25