Reputation: 286
I have a problem setting up my text classification using naive bayes. First I have 3 text files, two templates with good/bad words, one testing file. My TermDocumentMatrix is created and I also have a vector of rating, according my previous rating templates:
TDM word1 word2 word3 word4 ... rating
doc1 1 1 1 good
doc2 1 1 1 bad
doc3 ...
The vector is not added to the TDM because I think cbind
converts the values to character
. So I split the matrix into two parts:
template_train <- complete_TDM[1:(x+y),]
text_test <- data.matrix(complete_TDM[((x+y+1):nrow(complete_TDM)),])
where x
is the number of rows of the good rating template and y
the bad one.
random <- sample(x+y)
template_train <- data.matrix(template_train[random,]) ###shuffle
rating_vector <- as.factor(rating[random]) ###vector containing rating, shuffled the same way
Then I create a naiveBayes model:
naive_model <- naiveBayes(rating_vector~., x = template_train, y=rating_vector)
want to predict
prediction <- predict(naive_model, text_test)
But in the last step, I receive an error:
> prediction <- predict(naive_model, text_test)
Error in log(sapply(seq_along(attribs), function(v) { :
non-numeric argument to mathematical function
Thanks in advance!
Ok I just solved the problem, I am now using data.matrix
instead of as.matrix
and as.factor
for my rating vector, but now I have the problem, everything good is rated bad and vice versa.
> table(prediction, rating_vector)
rating_vector
prediction bad good
bad 0 95
good 94 0
Upvotes: 0
Views: 1050
Reputation: 169
You can just use
text_test = data.frame(text_test)
prediction <- predict(naive_model, text_test)
Upvotes: 0