Tappy
Tappy

Reputation: 33

Explanation of output for Naive bayes algorithm in R

I am new to Statistics and data analysis in R. Today i was trying Naive Bayes algorithm in R. The problem i am facing is that I am unable to understand the output of the prediction. The code is followed like this:

install.packages('ElemStatLearn')
library('ElemStatLearn')

library("klaR") library("caret")

sub = sample(nrow(spam), floor(nrow(spam) * 0.9))

train = spam[sub,]

test = spam[-sub,]

xTrain = train[,-58]

yTrain = train$spam

xTest = test[,-58]

yTest = test$spam

model = train(xTrain,yTrain,'nb',trControl=trainControl(method='cv',number=10)) 
prop.table(table(predict(model$finalModel,xTest)$class,yTest))`

Result display here is as follow:

   yTest
             email       spam
  email 0.33405640 0.02603037
  spam  0.24945770 0.39045553

Can refer this link to see http://joshwalters.com/2012/11/27/naive-bayes-classification-in-r.html

Upvotes: 1

Views: 2221

Answers (2)

Indi
Indi

Reputation: 1449

The result that you have displayed is called a 'confusion matrix'. It is used to verify how well your classifier has worked.

You will need to understand a few terms here :- True positive (TP), False positive (FP),True negative (TN) ,False negative (FN)

Compare :

enter image description here

with your case

enter image description here

The diagonal from left top to right bottom gives you the %age of right predictions, and the other two values indicate the %age that your classifier got "confused"

Hope this gives an initial idea. Google for confusion matrix and you can find more. One good link is here : https://classeval.wordpress.com/introduction/basic-evaluation-measures/

Upvotes: 1

Shahar Bental
Shahar Bental

Reputation: 1001

It is not the naive bayes model's output.

Once you used predict, you don't really "care" about the model, because you already obtained the prediction.

table.prop creates the proportion out of each combination for the entire population. You might want to consider looking at the table without the proportion part, to see the actual numbers

For example 33.4% will be detected as email and will be actually an email, while 2.6% will be detected as email while they are actually spam.

Upvotes: 0

Related Questions