Reputation: 40969
I'm using Weka's J48 (C4.5) decision tree classifier. In general for a decision tree, can a classification probability distribution be determined once you hit a leaf? I know with Naive Bayes, each classification attempt produces a classification distribution.
If it is possible with a decision tree, is this capability available in with the Weka J48 tree? I can alternatively try to implement my own tree.
Upvotes: 3
Views: 2655
Reputation: 12152
As each leaf has a classification decision that is in fact a discrete distribution, one that has 100% for the class it indicates and 0 for all other classes. You could use the training set to generate a distribution for all inner nodes if you want, as well.
If you do pruning after you learn the tree, you can re-run the training set through the tree and label each leaf with the frequency it each actual class lands in that leaf and that would be your distribution.
EDIT: For example once you get your tree. You can associate to each node a histogram with one bin for each class. And then go classify the training set, if you go through a node in the tree, add one to the corresponding bin to that class. After going through the full training set, just normalize each histogram to add 1. At the en if you feel that the leafs are too close to 100% you can then determine what further to prune by using the entropy of each histogram, for example.
Upvotes: 6