FrauHahnhen
FrauHahnhen

Reputation: 153

Error with function topicmodels::lda in R

I'm trying to use LDA model from topicmodels package in R. I need to measure method's instability so I have generated true parameters from the Dirichlet distribution for w = 3000 words, t = 8 topics and d = 50 documents with approximately 60 words in each one:

Theta = t(rdirichlet(d, alpha))

Phi = t(rdirichlet(t, beta))

docs = matrix(0, nrow = d, ncol = w)

for (i in 1:d)  {   
    curn = rnorm(1, mean = 60, sd = 10)    
    for (j in 1:curn)   {
        curt = rdiscrete(1, Theta[,d], 1:t)
        curw = rdiscrete(1, Phi[,curt], 1:w)
        docs[i, curw] = docs[i, curw] + 1
        }
    }

So my docs matrix is a sparse matrix d * w and almost all elements are 0 or 1.

Then I need my docs matrix to be an object of the DocumentTermMatrix class to use it in topicmodels:lda():

docs = as.DocumentTermMatrix(docs, weighting = weightTf)

I need to use Gibbs sampling method, so I write

ldafitmodel <- lda(docs, t, method = "Gibbs")

And then I get:

Error in lda.default(docs, t, method = "Gibbs") : nrow(x) and length(grouping) are different

I guess this topicmodels package uses MASS package, but then this grouping parameter is something I can't control explicitly, can I? Or what do I do wrong with my data?

Please help me!

BR, Maria

Upvotes: 1

Views: 2820

Answers (1)

David
David

Reputation: 9405

Your exact problem isn't reproducible because you don't have d, t, w, alpha or beta defined and do not load the appropriate packages for your rdirichlet() and rdiscrete() function calls. However, I'm pretty sure your problem is calling the lda() function from the MASS package - which is for linear discriminate analysis, not latent dirichlet allocation - instead of the LDA() function from the topicmodels package. R is case sensitive, so those caps makes a difference. Also as a note, if you think you may be experiencing a simliar problem in the future but with objects that have the exact same name then you can specify the exact object you would like via namespaces in the form of ::, for example topicmodels::LDA().

Anyway, I can't reproduce your example but I think this example should illustrate your error and a working solution.

> library(topicmodels)
> data(AssociatedPress)
> docs = AssociatedPress[1:100]
> ldafitmodel <- lda(docs, 4, method = "Gibbs")
Error in lda.default(docs, 4, method = "Gibbs") : 
  nrow(x) and length(grouping) are different
> (ldafitmodel <- LDA(docs, 4, method = "Gibbs"))
A LDA_Gibbs topic model with 4 topics. 

Upvotes: 4

Related Questions