Reputation: 511
I'm trying to predict solve a multiclass classification using the xgboost algorithm, however i do not know how does predict_proba
works exactly. In fact, predict_proba
generates a list of probabilities but i don't know to which class each probability is related.
Here is a simple example:
This my train data:
+------------+----------+-------+
| feature1 | feature2 | label |
+------------+----------+-------+
| x | z | 3 |
+------------+----------+-------+
| y | u | 0 |
+------------+----------+-------+
| x | u | 2 |
+------------+----------+-------+
Then when I try to predict probas for a new example
model.predict_proba(['x','u'])
This will return something like this:
[0.2, 0.3, 0.5]
My question is : what is the class that has the probability of 0.5 ? is it the class 2, or 3 or 0 ?
Upvotes: 6
Views: 25667
Reputation: 3223
It seems that you use the sklearn API of xgboost. In this case the model has a dedicated attribute model.classes_
that returns the classes that were learned by the model and the order of classes in the output array corresponds to the order of probabilities.
Here is an example with dummy data:
import numpy as np
import pandas as pd
import xgboost as xgb
# generate dummy data (10k examples, 10 numeric features, 4 classes of target)
np.random.seed(312)
train_X = np.random.random((10000,10))
train_y_mcc = np.random.randint(0, 4, train_X.shape[0]) #four classes:0,1,2,3
# model
xgb_model_mpg = xgb.XGBClassifier(max_depth= 3, n_estimators=100)
xgb_model_mpg.fit(train_X, train_y_mcc)
# classes
print(xgb_model_mpg.classes_)
>>> [0 1 2 3]
Upvotes: 9