Reputation: 21
I did a Naive Bayes classification using 10 fold cross-validation, obtaining a table prediction on the test data that looks like this:
=== Predictions on test data ===
inst# actual predicted error prediction (name)
1 3:no_chang 3:no_chang 0.943 (region_1)
2 1:active_K 1:active_K 1 (region_2)
3 3:no_chang 3:no_chang 0.912 (region_3)
4 3:no_chang 3:no_chang 0.858 (region_4)
5 3:no_chang 2:active_G + 0.518 (region_5)
I want to know how the "prediction" column is calculated. I know that it goes from 0 to 1, 1 meaning that the prediction is "better", but that's all I could find after a considerable amount of time googling and browsing the Weka book.
I know there is plenty of information about Weka online, but I'm a bit overwhelmed by it and can't easily find the answer to my simple question. Also, can someone point me to a good detailed weka manual for a command line user? The Weka book seems to focus too much in explaining how the GUI works, which doesn't really interest me since I mainly work with the command-line tools for the moment.
Thank you,
Juan
Upvotes: 2
Views: 3478
Reputation: 3258
By looking at the source code for the NaiveBayes
class, there is a variable called m_ClassDistribution
which keeps track of the class prediction.
In the training phase, this variable is updated to reflect the apriori probability of each class. It is used in the test phase to calculate the posterior probability of a given sample belonging to a given class.
I would recommend looking at the code for DiscreteEstimator
and NaiveBayes
. Particularly, distributionForInstance
function, which is used in the test phase. It is a bit different from the normal calculation of naive bayes, as it also takes into account a weight associated with each feature.
Upvotes: 1