Implementing Naive Bayes for text classification using Quanteda

Question

I have a dataset of BBC articles with two columns: 'category' and 'text'. I need to construct a Naive Bayes algorithm that predicts the category (i.e. business, entertainment) of an article based on type.

I'm attempting this with Quanteda and have the following code:

library(quanteda)

bbc_data <- read.csv('bbc_articles_labels_all.csv')
text <- textfile('bbc_articles_labels_all.csv', textField='text')
bbc_corpus <- corpus(text)
bbc_dfm <- dfm(bbc_corpus, ignoredFeatures = stopwords("english"), stem=TRUE)


# 80/20 split for training and test data
trainclass <- factor(c(bbc_data$category[1:1780], rep(NA, 445)))
testclass <- factor(c(bbc_data$category[1781:2225]))

bbcNb <- textmodel_NB(bbc_dfm, trainclass)
bbc_pred <- predict(bbcNb, testclass)

It seems to work smoothly until predict(), which gives:

Error in newdata %*% log.lik : 
  requires numeric/complex matrix/vector arguments

Can anyone provide insight on how to resolve this? I'm still getting the hang of text analysis and quanteda. Thank you!

Here is a link to the dataset.

Implementing Naive Bayes for text classification using Quanteda

Answers (1)

Related Questions