dsl1990
dsl1990

Reputation: 1209

What does the value of 'leaf' in the following xgboost model tree diagram means?

enter image description here

I am guessing that it is conditional probability given that the above (tree branch) condition exists. However, I am not clear on it.

If you want to read more about the data used or how do we get this diagram then go to : http://machinelearningmastery.com/visualize-gradient-boosting-decision-trees-xgboost-python/

Upvotes: 25

Views: 22150

Answers (4)

sameershah141
sameershah141

Reputation: 348

If it is a regression model (objective can be reg:squarederror), then the leaf value is the prediction of that tree for the given data point. The leaf value can be negative based on your target variable. The final prediction for that data point will be sum of leaf values in all the trees for that point.

If it is a classification model (objective can be binary:logistic), then the leaf value is representative (like raw score) for the probability of the data point belonging to the positive class. The final probability prediction is obtained by taking sum of leaf values (raw scores) in all the trees and then transforming it between 0 and 1 using a sigmoid function. The leaf value (raw score) can be negative, the value 0 actually represents probability being 1/2.

Please find more details about the parameters and outputs at - https://xgboost.readthedocs.io/en/latest/parameter.html

Upvotes: 10

Allen Qin
Allen Qin

Reputation: 19947

For a classification tree with 2 classes {0,1}, the value of the leaf node represent the raw score for class 1. It can be converted to a probability score by using the logistic function. The calculation below use the left most leaf as an example.

1/(1+np.exp(-1*0.167528))=0.5417843204057448

What this means is if a data point ends up being distributed to this leaf, the probability of this data point being class 1 is 0.5417843204057448.

Upvotes: 27

Wasi Ahmad
Wasi Ahmad

Reputation: 37691

You are correct. Those probability values associated with leaf nodes are representing the conditional probability of reaching leaf nodes given a specific branch of the tree. Branches of trees can be presented as a set of rules. For example, @user1808924 mentioned in his answer; one rule which is representing the left-most branch of your tree model.

So, in short: The tree can be linearized into decision rules, where the outcome is the contents of the leaf node, and the conditions along the path form a conjunction in the if clause. In general, the rules have the form:

if condition1 and condition2 and condition3 then outcome.

Decision rules can be generated by constructing association rules with the target variable on the right. They can also denote temporal or causal relations.

Upvotes: 2

user1808924
user1808924

Reputation: 4926

Attribute leaf is the predicted value. In other words, if the evaluation of a tree model ends at that terminal node (aka leaf node), then this is the value that is returned.

In pseudocode (the left-most branch of your tree model):

if(f1 < 127.5){
  if(f7 < 28.5){
    if(f5 < 45.4){
      return 0.167528f;
    } else {
      return 0.05f;
    }
  }
}

Upvotes: 7

Related Questions