Reputation: 11
data <- data.frame(day_type = c("weekend", "weekend", "weekend","weekend",
"weekday", "weekday", "weekday", "weekday"),
vehicle = c("car", "car", "car", "car",
"bus", "bus", "bus", "bus"))
library(naivebayes)
model <- naive_bayes(vehicle ~ day_type, data = data)
predict(model, data.frame(day_type = "weekend"))
[1] bus
Levels: bus car
Expected answer should be car here, but I am getting bus as answer. Please help to identify the error.
Upvotes: 1
Views: 33
Reputation: 16121
This will help you understand the issue:
data <- data.frame(day_type = c("weekend", "weekend", "weekend","weekend",
"weekday", "weekday", "weekday", "weekday"),
vehicle = c("car", "car", "car", "car",
"bus", "bus", "bus", "bus"))
library(naivebayes)
model <- naive_bayes(vehicle ~ day_type, data = data)
dt_test1 = data.frame(day_type = "weekend")
dt_test2 = data.frame(day_type = "weekday")
dt_test3 = data.frame(day_type = c("weekend","weekday"))
predict(model, newdata = dt_test1)
# [1] bus
# Levels: bus car
predict(model, newdata = dt_test2)
# [1] bus
# Levels: bus car
predict(model, newdata = dt_test3)
# [1] car bus
# Levels: bus car
Test datasets 1 and 2 have 1 level and they assign the value 1 to "weekend" and "weekday" respectively. Then the model understands values 1 and 2 (based on what you have in your original dataset data
) and doesn't care about the labels (weekday/weekend).
However, in test dataset 3 you have two labels and they get the correct values (wwekend/weekday -> 1/2).
As an extreme case scenario check this:
dt_test4 = data.frame(day_type = c("January","February"))
predict(model, newdata = dt_test4)
# [1] car bus
# Levels: bus car
You will still get predictions! Because those values, that the model doesn't even understand, are coded to 1 and 2.
Therefore, as @Aaron suggested, make sure you make sure the factor levels match, or use character variables instead of factor variables.
Upvotes: 3