Reputation: 11
I have used the output predictions of J48 classifier in Weka and got the results with predictions (probability). As I need to use these predictions number in my research, I need to know how the weka calculates these numbers? What is the formula? Is it specified for each classifier?
Upvotes: 0
Views: 2031
Reputation: 748
In addition to Jan Eglinger answer.
The J48 classifier is Weka's implementation of the infamous C4.5 decision tree classifier, which is a classification algorithm based on ID3 that classifies using information entropy.
The training data is a set S = {s_1, s_2, ...}
of already classified samples. Each sample s_i consists of a p-dimensional vector (x_{1,i}, x_{2,i}, ...,x_{p,i})
, where the x_j
represent attribute values or features of the sample, as well as the class in which s_i
falls.
At each node of the tree, C4.5 chooses the attribute of the data that most effectively splits its set of samples into subsets enriched in one class or the other. The splitting criterion is the normalized information gain (difference in entropy). The attribute with the highest normalized information gain is chosen to make the decision. The C4.5 algorithm then recurs on the smaller sublists.
This algorithm has a few base cases.
All the samples in the list belong to the same class. When this happens, it simply creates a leaf node for the decision tree saying to choose that class.
None of the features provide any information gain. In this case, C4.5 creates a decision node higher up the tree using the expected value of the class.
Instance of previously-unseen class encountered. Again, C4.5 creates a decision node higher up the tree using the expected value.
You can find the information Gain and entropy in the Weka Api package. For that you need to start dubbing the java weka api and go through each step.
In general, if you don't worry about how algorithm works internally using high level mathematics. Try to calculate InformationGain and entropy and explain them in your research apart from decision trees, you have methods for both of these to calculate their value.
Upvotes: 1
Reputation: 4090
What is the formula?
Weka's J48 classifier is an implementation of the C4.5 algorithm.
I need to know how the weka calculates these numbers?
You can find implementation details in J48.java
and in the weka.classifiers.trees.j48
package.
Upvotes: 0