Reputation: 31
I am new to R, and am trying to solve why my predict is out of bounds. the question should be an easy fix as this is more of an introduction.
set my classifier with train data
sms_classifier <- naiveBayes(sms_train, sms_train_labels)
but error occurs when i try to do the predict funtion
sms_test_pred <- predict(sms_classifier, sms_test)
error given Error in
`[.default`(object$tables[[v]], , nd + islogical[attribs[v]]) : subscript out of bounds
Upvotes: 3
Views: 2538
Reputation: 31
Assuming you are trying to build a spam classifier that requires documentTerm matrix, this happens when you have more factors (aka terms) in the test dataset that are not present in the train dataset. So get rid of these very rare terms using the below code -
freq_terms = findFreqTerms(dtm.train, 5)
reduced_dtm.train = DocumentTermMatrix(corpus.train, list(dictionary=freq_terms))
reduced_dtm.test = DocumentTermMatrix(corpus.test, list(dictionary=freq_terms))
The above code will get rid of any less frequent terms (anyway they are useless) and the levels in the test dataset will match the levels in the train dataset. Then the predict function should not throw any error.
Upvotes: 1