textProcessor changes the number of observations of my corpus (using with stm package in R)

Question

I'm working with a dataset that has 439 observations for text analysis in stm. When I use textProcessor, the number of observations changes to 438 for some reason. This creates problems later on: when using the findThoughts() function, for example.

##############################################
#PREPROCESSING
##############################################

#Process the data for analysis.
temp<-textProcessor(sovereigncredit$Content,sovereigncredit, customstopwords = customstop, stem=FALSE)
meta<-temp$meta
vocab<-temp$vocab
docs<-temp$documents
length(docs) # QUESTION: WHY IS THIS 438 instead of 439, like the original dataset?
length(sovereigncredit$Content) # See, this original one is 439.
out <- prepDocuments(docs, vocab, meta)
docs<-out$documents
vocab<-out$vocab
meta <-out$meta

An example of this becoming a problem down the line is:

thoughts1<-findThoughts(sovereigncredit1, texts=sovereigncredit$Content,n=5, topics=1)

For which the output is:

"Error in findThoughts(sovereigncredit1, texts = sovereigncredit$Content, : Number of provided texts and number of documents modeled do not match"

In which "sovereigncredit1" is a topic model based on "out" from above.

If my interpretation is correct (and I'm not making another mistake), the problem seems to be this 1 observation difference in the number of observations pre and post textprocessing.

So far, I've looked at the original csv and made sure there are in fact 439 valid observations and no empty rows. I'm not sure what's up. Any help would be appreciated!

textProcessor changes the number of observations of my corpus (using with stm package in R)

Answers (1)

Related Questions