michael zxc858
michael zxc858

Reputation: 407

using R naive bayes e1702

Is this code correct?
library(e1701) ## Categorical data only:
data(HouseVotes84, package = "mlbench")
model <-
naiveBayes(Class ~ ., data = HouseVotes84)
a<-c("n","y","n","y","n","n","y","y","n","n")
names(a)<-c("V1","V2","V3","V4","V5","V6","V7","V8","V9","V10")
pred<-predict(model,a)
tab<-table(pred,a)
sum(tab[row(tab)==col(tab)])/sum(tab)

I want to make a prediction based on a voting record using the model

Upvotes: 0

Views: 265

Answers (1)

Ken Benoit
Ken Benoit

Reputation: 14902

It's hard to know exactly what you intended, but it seems you want to predict the party (Class) of this legislator based on his or her values of V1:V10. If that's so, then here is what you want:

library(e1071) 
data(HouseVotes84, package = "mlbench")
model <- naiveBayes(Class ~ ., data = HouseVotes84)
a <- data.frame(matrix(c("n","y","n","y","n","n","y","y","n","n"), nrow = 1))
names(a) <- c("V1","V2","V3","V4","V5","V6","V7","V8","V9","V10")
(pred <- predict(model, a))
# [1] democrat
# Levels: democrat republican
(pred <- predict(model, a, type = "raw"))
#       democrat republican
# [1,] 0.9277703  0.0722297

The code you supplied has two mistakes: First, you are not loading the package that contains naiveBayes() correctly, as the name is actually e1071; and second, you are not supplying the correct item to newdata in predict(). That needs a data.frame, and you are supplying it a vector that gets treated here as 10 observations, with one feature supplied on each: V1 for the first, V2 for the second, etc. nativeBayes() doesn't care if you supply it an incomplete feature list, so it still works:

> pred
 [1] democrat democrat democrat democrat democrat democrat democrat democrat democrat democrat
Levels: democrat republican
> (pred <- predict(model,a, type = "raw"))
       democrat republican
 [1,] 0.6137931  0.3862069
 [2,] 0.6137931  0.3862069
 [3,] 0.6137931  0.3862069
 [4,] 0.6137931  0.3862069
 [5,] 0.6137931  0.3862069
 [6,] 0.6137931  0.3862069
 [7,] 0.6137931  0.3862069
 [8,] 0.6137931  0.3862069
 [9,] 0.6137931  0.3862069
[10,] 0.6137931  0.3862069

but here you are getting ten predictions that are uninformative because you have only one feature with which to predict each. That's why the predictions match correspond to the prior, since you are updating with almost no data:

# prior class probabilities (with no model)
> prop.table(table(HouseVotes84$Class))

  democrat republican 
 0.6137931  0.3862069 

In the corrected code above, using more features to predict the Class for this new (single) observation with data on ten vote features, we had a more confident prediction that this legislator is Democrat, because the posterior probabilities were based on more data to update the prior class probabilities of 0.61 and 0.39.

Upvotes: 1

Related Questions