Reputation: 8064
I have fit an instance of DecisionTreeClassifier and I am trying to extract prediction probabilities for each node. I need this in order to create custom decision tree visualization similar to what is shown below.
I can export features and thresholds for each node.
dtc.tree_.feature
Out[72]: array([93, 36, 92, 51, 84, -2, 20, -2, -2, -2, -2, -2, 6, -2, -2])
dtc.tree_.threshold
Out[73]:
array([ 50.5 , 0.5 , 85.50991821, 0.5 ,
5.5 , -2. , 0.5 , -2. ,
-2. , -2. , -2. , -2. ,
0.5 , -2. , -2. ])
Ideally I would export prediction probabilities for each node using something similar to this.
dtc.tree_.probability
Out[xx]:
array([0.50, 0.42, 0.21, 0.45, 0.62, ....])
Is this possible?
Upvotes: 1
Views: 2482
Reputation: 8064
I discovered that values is the count of samples that fall in each class and my "prediction probability" can be considered the proportion of samples that fall in a given class. Thus I can calculate it using the following:
samples = dtc.tree_.n_node_samples
class1_positives = dtc.tree_.value[:,0,1]
probs = (class1_positives/samples).tolist()
Upvotes: 2