Selah
Selah

Reputation: 8064

Predicted classification probability from a tree node in trained sklearn DecisionTreeClassifier

I have fit an instance of DecisionTreeClassifier and I am trying to extract prediction probabilities for each node. I need this in order to create custom decision tree visualization similar to what is shown below.

I can export features and thresholds for each node.

dtc.tree_.feature
Out[72]: array([93, 36, 92, 51, 84, -2, 20, -2, -2, -2, -2, -2,  6, -2, -2])

dtc.tree_.threshold
Out[73]: 
array([ 50.5       ,   0.5       ,  85.50991821,   0.5       ,
         5.5       ,  -2.        ,   0.5       ,  -2.        ,
        -2.        ,  -2.        ,  -2.        ,  -2.        ,
         0.5       ,  -2.        ,  -2.        ])

Ideally I would export prediction probabilities for each node using something similar to this.

dtc.tree_.probability
Out[xx]:
array([0.50, 0.42, 0.21, 0.45, 0.62, ....])

Is this possible?

enter image description here

Upvotes: 1

Views: 2482

Answers (1)

Selah
Selah

Reputation: 8064

I discovered that values is the count of samples that fall in each class and my "prediction probability" can be considered the proportion of samples that fall in a given class. Thus I can calculate it using the following:

samples = dtc.tree_.n_node_samples
class1_positives = dtc.tree_.value[:,0,1]
probs = (class1_positives/samples).tolist()

Upvotes: 2

Related Questions