Katie Nissen
Katie Nissen

Reputation: 79

stm function won't stop running?

I am trying to run a structured topic model on some documents, but I cannot get the stm to finish even with a trivial subset of the documents (I have about 5000 documents total, but I'm subsetting down to the first 20 just to test the code). In this stm, I'm not even trying to include covariates. I have a version of the text analysis done with LDA, and I'm just trying to confirm that the results are similar when using an stm without covariates. When I run the code below, I get this message on the console and it does not change:

Beginning Spectral Initialization 
     Calculating the gram matrix...
     Finding anchor words...
    ..........
     Recovering initialization...
    ..
Initialization complete. 

I've had to close out R entirely to get it to quit running. Here is the code:

test<- files[1:20, ] #the first 20 documents 
met<- mytab[1:20,] #the first 20 rows of the df with the metadata for each document

corp<- corpus(test, text_field = "text")
dfm <- dfm(tokens(corp, remove_punct=TRUE), 
           tolower=TRUE, stem=TRUE, remove_numbers = TRUE,
           remove=(c(stopwords("english"), 's', '$')))
dfm2<-dfm_subset(dfm, ntoken(dfm)>0)

# use quanteda converter to convert our Dfm
stmdfm <- convert(dfm2, to = "stm", docvars = met)
plotRemoved(stmdfm$documents, lower.thresh = seq(1, 80, by = 20))
out <- prepDocuments(stmdfm$documents, stmdfm$vocab, stmdfm$meta, lower.thresh = 3)

k <- 10
stmFit <- stm(out$documents, 
              out$vocab, 
              K = k,  
              max.em.its = 150, 
              data = out$meta, 
              init.type = "Spectral", 
              seed = 300)

Any thoughts on why this code will not produce results? I expect stm() to take a while, but I have left this running on just the 20 documents (which each average just a few hundred words in txt files) for over 30 minutes with no results.

Upvotes: 1

Views: 221

Answers (1)

Katie Nissen
Katie Nissen

Reputation: 79

The very simple answer was to reinstall the Rcpp package from scratch:

install.packages("Rcpp")
library(Rcpp)

Upvotes: 1

Related Questions