Naive Bayes classification in R with opposite result..

Question

I am trying to do a Naive Bayes classification using R (Package e1071). Tried the usual Golf example and I am always getting opposite result.

Scenario: If the weather is good, do I play Golf 'Yes' or 'No'? Very straightforward instance.

Created a training dataset (df) and as per the training dataset, i am expecting the result as 'Yes' for 'Good' weather but its giving me a 'No'.

[1] No
Levels: No Yes

Any reason why is it happening this way? Is my understanding wrong or am i doing something wrong?

All supports are much appreciated..

Cheers..!

weather <- c("Good", "Good", "Good", "Bad", "Bad","Good")
golf <- c("Yes","No","Yes","No","Yes","Yes")
df <- data.frame(weather, golf) #Training dataset

df[] <- lapply(df, factor) #changing df to factor variables

df_new <- data.frame(weather = "Good") #Test dataset

library(e1071)
model <- naiveBayes(golf ~.,data=df)
predict(model, df_new, type ="class")

AshOfFire · Accepted Answer

This is because factor encoding can be misleading. Indeed, if you do not make sure that factors in df and df_new are encoded the same way, you will get (seemingly) absurd results compared to what you see.

Take a look at the integer encoding of df

print(df$weather)
Good Good Good Bad  Bad  Good
print(as.integer(df$weather))
2 2 2 1 1 2

And compare it to the encoding of df_new

print(df_new$weather)
Good
print(as.integer(df_new$weather))
1

Good has been mapped to 1 in df_new, while 1 corresponds to Bad in df. So when you are applying your model, your are asking for a prediction based on a Bad weather.

You need to set the factors of df_new the same way they are encoded in df

df_new <- data.frame(weather = "Good") #Test dataset
df_new$weather <- factor(df_new$weather, levels(df$weather))

Naive Bayes classification in R with opposite result..

Answers (2)

Related Questions