gracious_banana
gracious_banana

Reputation: 1

XGBoost: How to calculate leaf weights in multi-class classification

I have trouble understanding how XGBoost calculates the leaf weights in multi-class classification.

I generated a simple example with two features and 3 classes. The training data looks like this:

feature 0 feature 1 label y
x1 1 1 0
x2 2 3 1
x3 3 1 2

From what I have read so far, XGBoost uses a "one vs. all" principle. Therefore, XGBoost creates separate trees for each class. With a setting of 2 boosting rounds, XGBoost creates 6 trees. The first tree corresponds to class 0, the second to class 1, the third to class 2, the fourth to class 0 and so on ...

I used the following python code to plot the first tree trees.

from xgboost import XGBClassifier, plot_tree
import numpy as np

X_train = np.array([[1,1],[2,3],[3,1]])
y_train = np.array([0,1,2])

param = {
    'max_depth': 2,
    'eta': 0.3,
    'objective': 'multi:softprob', 
    'num_class': 3, 
    'min_child_weight': 0,
    'n_estimators' : 2,
    'base_score': 0.5
}

clf = XGBClassifier(**param)
clf.fit(X_train, y_train)

plot_tree(clf, num_trees=0)
plot_tree(clf, num_trees=1)
plot_tree(clf, num_trees=2)

The result looks like this: first tree booster

The question is how the weights of the leaves are calculated?

My thoughts so far:
As I understand it, XGBoost uses the log loss function for binary classification.
L(p,y) = −( ylog(p) + (1−y)log(1−p) )
(where p= 1/(1+exp(-x))))
y is the true label and p is the probability prediction of the model that the sample belongs to class 1.
The gradient and hessian of this loss function are:
gradient: p - y
hessian: p(1-p)

There is a nice explanation of how the ideal weight is calculated for a fixed tree structure (XGBoost Docs). Accordingly, the ideal weight is calculated by:
w_t = - G_t / (H_t + lambda)
where G_t is the sum of gradient statistics of all samples belonging to the leaf. Similarly, H_t is the sum of second-order statistics (the hessian) of all samples belonging to the leaf.

Now I would like to calculate the weights of the first tree. The initial prediction score of all instances shut be 0.5. Since this tree corresponds to class 0 the true labels shut be y1=1, y2=0, y3=0 (since only sample 1 has class 0). The gradient and hessian statistics are:
g1 = 0.5 - 1 = -0.5
g2 = 0.5 - 0 = 0.5
g3 = 0.5 - 0 = 0.5

h1 = 0.5(1-0.5) = 0.25
h2 = 0.5(1-0.5) = 0.25
h3 = 0.5(1-0.5) = 0.25

The first tree has only one split criterion f0 < 1.5. Therefore sample 1 is assigned to the left leaf and samples 2 and 3 are assigned to the right leaf.

The weights shut be calculated by:
w1 = - g1 / (h1 + lambda) * eta = 0.5 / (0.25 + 1) * 0.3 = 0.12
w2 = - (g2 + g3) / (h2 + h3 + lambda) * eta = -(0.5 +0.5) / (0.25 + 0.25 + 1) * 0.3 = - 0.1998
with eta the learning rate

My results do not match with the result of XGBoost, since w1 and w2 should be approximately 0.1384 and -0.106.

What am I doing wrong?

I think my real problem is that I do not understand the log loss and how the sigmoid function influences the result.

I would be very grateful if someone can help me with this.

Upvotes: 0

Views: 1067

Answers (1)

gracious_banana
gracious_banana

Reputation: 1

I found my mistake, actually, there are two mistakes:

  1. The hessian is wrong: 2*p(1-p)
  2. The prediction score are normalized by the softmax function -> exp(0.5) / (3 * exp(0.5) = 0.33

g1 = 0.33 - 1 = -0.67
g2 = g3 = 0.33 - 0 = 0.33
h1 = h2 = h3 = 2 * 0.33 (1 - 0.33) = 0.4422

w1 = - (-0.67) / (0.4422 + 1 ) * 0.3 = 0.139
w2 = -(0.33 + 0.33) / (0.4422 + 0.4422 + 1) * 0.3 = -0.105

Upvotes: 0

Related Questions