Reputation: 2365
I train a multi label classification model with XGBBoost and want to code this model in another system.
Is it possible to see the text output of my XGBClassifier model as dump_model in XGB Booster.
Edit: I found that model._Booster.dump_model(outputfile) returns a dump file as below. However, there is nothing that specifies the class. In my model, there are 10 classes, however in the dumpfile there is only a booster. So, I'm not sure if it's the model of all classes or just one of them.
booster[0]:
0:[101<0.142245024] yes=1,no=2,missing=1
1:[107<0.102833837] yes=3,no=4,missing=3
3:[101<0.039123565] yes=7,no=8,missing=7
7:leaf=-0.0142603116
8:leaf=0.023763923
4:[101<0.0646461397] yes=9,no=10,missing=9
9:leaf=-0.0345750563
10:leaf=-0.0135767004
2:[107<0.238691002] yes=5,no=6,missing=5
5:[103<0.0775454491] yes=11,no=12,missing=11
11:leaf=0.188941464
12:leaf=0.0651629418
6:[101<0.999929309] yes=13,no=14,missing=13
13:leaf=0.00403384864
14:leaf=0.236842111
booster[1]:
0:[102<0.014829753] yes=1,no=2,missing=1
1:[102<0.00999682024] yes=3,no=4,missing=3
3:[107<0.0966737345] yes=7,no=8,missing=7
7:leaf=-0.0387153365
8:leaf=-0.0486520194
4:[107<0.0922582299] yes=9,no=10,missing=9
9:leaf=0.0301927216
10:leaf=-0.0284226239
2:[102<0.199759275] yes=5,no=6,missing=5
5:[107<0.12201979] yes=11,no=12,missing=11
11:leaf=0.093562685
12:leaf=0.0127987256
6:[107<0.298737913] yes=13,no=14,missing=13
13:leaf=0.227570012
14:leaf=0.113037519
Upvotes: 10
Views: 1990
Reputation: 340
Looking at the source code and output on a sample dataset, it looks as though the n
th tree estimates the likelihood of a given instance belonging to class n modulo num_class . I believe xgboost uses the softmax function, so you'd want to add the output of tree i to weight[i%10]
and then take the softmax of the resulting weights.
Something like this should work, assuming you have a function booster_output(features, booster_index)
that can determine the output of the nth booster tree for given feature values:
import numpy as np
num_class = 10
num_boosters = 800
weight_of_classes = [0]*num_class
for i in range(num_boosters):
weight_of_classes[i%6] += booster_output(feature_values, i)
def softmax(x):
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()
probability_of_classes = softmax(weight_of_classes)
print(probability_of_classes)
Upvotes: 4