Joey Gao
Joey Gao

Reputation: 949

How to calculate the weight and value in lightgbm?

We can use tree_to_dataframe or lgb.create_tree_digraph to display the structure of lightgbm model. The internal node and leaf node both have weight and value.

The document says:

value : float64, predicted value for this leaf node, multiplied by the learning rate.

weight : float64 or int64, sum of hessian (second-order derivative of objective), summed over observations that fall in this node.. How are the two values calculated?

we know that, in binary logloss: $$
\begin{aligned}
G &= \hat{y}-y \
H &= \hat{y}(1-\hat{y})
\end{aligned}
$$

  1. The $H$ will not be $0$ as we have a base value before the first tree is created, but why the weight in each tree's root node is $0$ in the following example?
  2. How is the internal node's value calculated?
import lightgbm as lgb
import numpy as np
import pandas as pd
import sklearn

X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
model = lgb.LGBMClassifier(random_state=1, n_estimators=2,
                           max_depth=1,
                           min_child_weight=15,
                           objective='binary'
                          )
model.fit(X, y)
model.booster_.trees_to_dataframe()
tree_index node_depth node_index left_child right_child parent_index split_feature split_gain threshold decision_type missing_direction missing_type value weight count
0 0 1 0-S0 0-L0 0-L1 Column_23 392.505 868.2 <= left None 0.52115 0 569
1 0 2 0-L0 0-S0 nan nan 0.641339 89.2982 382
2 0 2 0-L1 0-S0 nan nan 0.275629 43.7141 187
3 1 1 1-S0 1-L0 1-L1 Column_7 327.362 0.05142 <= left None 0 0 569
4 1 2 1-L0 1-S0 nan nan 0.128938 79.2656 349
5 1 2 1-L1 1-S0 nan nan -0.19224 52.9234 220
lgb.create_tree_digraph(model, tree_index=0, show_info=['split_gain', 'internal_value', 'internal_count',
                                                        'internal_weight', 'leaf_count', 'leaf_weight', 'data_percentage'])

enter image description here

lgb.create_tree_digraph(model, tree_index=0, show_info=['split_gain', 'internal_value', 'internal_count',
                                                        'internal_weight', 'leaf_count', 'leaf_weight', 'data_percentage'])

enter image description here

Upvotes: 0

Views: 3609

Answers (1)

Roger P L
Roger P L

Reputation: 11

Just quick question: Have you studied how a a normal Extreme Gradient Tree works maybe from the scikitlearn library?

You are using LightGBM, which is a fanstastic algorithm but advanced. Maybe go first with the general concepts, the math behind the algorithm, papers... and then you would be able to answer yourself.

The article referent to Light GBM:

https://papers.nips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf

The repo code in github, you can explore everything you need:

https://github.com/microsoft/LightGBM

And the documentation:

https://lightgbm.readthedocs.io/en/latest/

Upvotes: 1

Related Questions