Reputation: 33
I am new to Statistics and data analysis in R. Today i was trying Naive Bayes algorithm in R. The problem i am facing is that I am unable to understand the output of the prediction. The code is followed like this:
install.packages('ElemStatLearn')
library('ElemStatLearn')
library("klaR")
library("caret")
sub = sample(nrow(spam), floor(nrow(spam) * 0.9))
train = spam[sub,]
test = spam[-sub,]
xTrain = train[,-58]
yTrain = train$spam
xTest = test[,-58]
yTest = test$spam
model = train(xTrain,yTrain,'nb',trControl=trainControl(method='cv',number=10))
prop.table(table(predict(model$finalModel,xTest)$class,yTest))`
Result display here is as follow:
yTest
email spam
email 0.33405640 0.02603037
spam 0.24945770 0.39045553
Can refer this link to see http://joshwalters.com/2012/11/27/naive-bayes-classification-in-r.html
Upvotes: 1
Views: 2221
Reputation: 1449
The result that you have displayed is called a 'confusion matrix'. It is used to verify how well your classifier has worked.
You will need to understand a few terms here :- True positive (TP), False positive (FP),True negative (TN) ,False negative (FN)
Compare :
with your case
The diagonal from left top to right bottom gives you the %age of right predictions, and the other two values indicate the %age that your classifier got "confused"
Hope this gives an initial idea. Google for confusion matrix and you can find more. One good link is here : https://classeval.wordpress.com/introduction/basic-evaluation-measures/
Upvotes: 1
Reputation: 1001
It is not the naive bayes model's output.
Once you used predict, you don't really "care" about the model, because you already obtained the prediction.
table.prop
creates the proportion out of each combination for the entire population. You might want to consider looking at the table without the proportion part, to see the actual numbers
For example 33.4% will be detected as email and will be actually an email, while 2.6% will be detected as email while they are actually spam.
Upvotes: 0