ABK
ABK

Reputation: 511

xgboost predict_proba : How to do the mapping between the probabilities and the labels

I'm trying to predict solve a multiclass classification using the xgboost algorithm, however i do not know how does predict_proba works exactly. In fact, predict_proba generates a list of probabilities but i don't know to which class each probability is related.

Here is a simple example:

This my train data:

+------------+----------+-------+
| feature1   | feature2 | label |
+------------+----------+-------+
|    x       |    z     |   3   |
+------------+----------+-------+
|    y       |    u     |   0   |
+------------+----------+-------+
|    x       |    u     |   2   |
+------------+----------+-------+

Then when I try to predict probas for a new example

model.predict_proba(['x','u'])

This will return something like this:

[0.2, 0.3, 0.5]

My question is : what is the class that has the probability of 0.5 ? is it the class 2, or 3 or 0 ?

Upvotes: 6

Views: 25667

Answers (1)

Mischa Lisovyi
Mischa Lisovyi

Reputation: 3223

It seems that you use the sklearn API of xgboost. In this case the model has a dedicated attribute model.classes_ that returns the classes that were learned by the model and the order of classes in the output array corresponds to the order of probabilities.

Here is an example with dummy data:

import numpy as np
import pandas as pd
import xgboost as xgb

# generate dummy data (10k examples, 10 numeric features, 4 classes of target)
np.random.seed(312)
train_X = np.random.random((10000,10))
train_y_mcc = np.random.randint(0, 4, train_X.shape[0]) #four classes:0,1,2,3

# model
xgb_model_mpg = xgb.XGBClassifier(max_depth= 3, n_estimators=100)
xgb_model_mpg.fit(train_X, train_y_mcc)

# classes
print(xgb_model_mpg.classes_)
>>> [0 1 2 3]

Upvotes: 9

Related Questions