rek
rek

Reputation: 187

Use dfm in searchK calcuation

From the stm there is the searchK() option to find the optimal K numbers of a topic modeling using a process like this:

library(stm)
library(quanteda)
library(ggplot2)

temp<-textProcessor(documents=gadarian$open.ended.response,metadata=gadarian)
out <- prepDocuments(temp$documents, temp$vocab, temp$meta)
documents <- out$documents
vocab <- out$vocab
meta <- out$meta
set.seed(02138)
K<-c(5,10,15)
df1 <- searchK(documents, vocab, K, data=meta)

This example in prepDocumenets() makes a specific preprocessing using stemming etc. How is it possible to change this preprocessing and use this dfm option to calculate the searchK()?

myDfm <- gadarian$open.ended.response %>%
     tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE)  %>%
     dfm()

Upvotes: 1

Views: 483

Answers (1)

Ken Benoit
Ken Benoit

Reputation: 14902

Use the convert(x, to = "stm") function from quanteda, to get the list that searchK() needs. So add this:

out <- convert(myDfm, to = "stm")

Then, the same code from above will work:

documents <- out$documents
vocab <- out$vocab
meta <- out$meta
set.seed(02138)
K <- c(5, 10, 15)
df1 <- searchK(documents, vocab, K, data = meta)

Upvotes: 2

Related Questions