Reputation: 407
Is this code correct?
library(e1701) ## Categorical data only:
data(HouseVotes84, package = "mlbench")
model <-
naiveBayes(Class ~ ., data = HouseVotes84)
a<-c("n","y","n","y","n","n","y","y","n","n")
names(a)<-c("V1","V2","V3","V4","V5","V6","V7","V8","V9","V10")
pred<-predict(model,a)
tab<-table(pred,a)
sum(tab[row(tab)==col(tab)])/sum(tab)
I want to make a prediction based on a voting record using the model
Upvotes: 0
Views: 265
Reputation: 14902
It's hard to know exactly what you intended, but it seems you want to predict the party (Class
) of this legislator based on his or her values of V1:V10
. If that's so, then here is what you want:
library(e1071)
data(HouseVotes84, package = "mlbench")
model <- naiveBayes(Class ~ ., data = HouseVotes84)
a <- data.frame(matrix(c("n","y","n","y","n","n","y","y","n","n"), nrow = 1))
names(a) <- c("V1","V2","V3","V4","V5","V6","V7","V8","V9","V10")
(pred <- predict(model, a))
# [1] democrat
# Levels: democrat republican
(pred <- predict(model, a, type = "raw"))
# democrat republican
# [1,] 0.9277703 0.0722297
The code you supplied has two mistakes: First, you are not loading the package that contains naiveBayes()
correctly, as the name is actually e1071
; and second, you are not supplying the correct item to newdata
in predict()
. That needs a data.frame, and you are supplying it a vector that gets treated here as 10 observations, with one feature supplied on each: V1 for the first, V2 for the second, etc. nativeBayes()
doesn't care if you supply it an incomplete feature list, so it still works:
> pred
[1] democrat democrat democrat democrat democrat democrat democrat democrat democrat democrat
Levels: democrat republican
> (pred <- predict(model,a, type = "raw"))
democrat republican
[1,] 0.6137931 0.3862069
[2,] 0.6137931 0.3862069
[3,] 0.6137931 0.3862069
[4,] 0.6137931 0.3862069
[5,] 0.6137931 0.3862069
[6,] 0.6137931 0.3862069
[7,] 0.6137931 0.3862069
[8,] 0.6137931 0.3862069
[9,] 0.6137931 0.3862069
[10,] 0.6137931 0.3862069
but here you are getting ten predictions that are uninformative because you have only one feature with which to predict each. That's why the predictions match correspond to the prior, since you are updating with almost no data:
# prior class probabilities (with no model)
> prop.table(table(HouseVotes84$Class))
democrat republican
0.6137931 0.3862069
In the corrected code above, using more features to predict the Class
for this new (single) observation with data on ten vote features, we had a more confident prediction that this legislator is Democrat, because the posterior probabilities were based on more data to update the prior class probabilities of 0.61 and 0.39.
Upvotes: 1