Reputation: 153
I'm trying to use LDA model from topicmodels package in R. I need to measure method's instability so I have generated true parameters from the Dirichlet distribution for w = 3000 words, t = 8 topics and d = 50 documents with approximately 60 words in each one:
Theta = t(rdirichlet(d, alpha))
Phi = t(rdirichlet(t, beta))
docs = matrix(0, nrow = d, ncol = w)
for (i in 1:d) {
curn = rnorm(1, mean = 60, sd = 10)
for (j in 1:curn) {
curt = rdiscrete(1, Theta[,d], 1:t)
curw = rdiscrete(1, Phi[,curt], 1:w)
docs[i, curw] = docs[i, curw] + 1
}
}
So my docs matrix is a sparse matrix d * w and almost all elements are 0 or 1.
Then I need my docs matrix to be an object of the DocumentTermMatrix class to use it in topicmodels:lda():
docs = as.DocumentTermMatrix(docs, weighting = weightTf)
I need to use Gibbs sampling method, so I write
ldafitmodel <- lda(docs, t, method = "Gibbs")
And then I get:
Error in lda.default(docs, t, method = "Gibbs") : nrow(x) and length(grouping) are different
I guess this topicmodels package uses MASS package, but then this grouping parameter is something I can't control explicitly, can I? Or what do I do wrong with my data?
Please help me!
BR, Maria
Upvotes: 1
Views: 2820
Reputation: 9405
Your exact problem isn't reproducible because you don't have d, t, w, alpha or beta defined and do not load the appropriate packages for your rdirichlet()
and rdiscrete()
function calls. However, I'm pretty sure your problem is calling the lda()
function from the MASS package - which is for linear discriminate analysis, not latent dirichlet allocation - instead of the LDA()
function from the topicmodels package. R is case sensitive, so those caps makes a difference. Also as a note, if you think you may be experiencing a simliar problem in the future but with objects that have the exact same name then you can specify the exact object you would like via namespaces in the form of ::, for example topicmodels::LDA()
.
Anyway, I can't reproduce your example but I think this example should illustrate your error and a working solution.
> library(topicmodels)
> data(AssociatedPress)
> docs = AssociatedPress[1:100]
> ldafitmodel <- lda(docs, 4, method = "Gibbs")
Error in lda.default(docs, 4, method = "Gibbs") :
nrow(x) and length(grouping) are different
> (ldafitmodel <- LDA(docs, 4, method = "Gibbs"))
A LDA_Gibbs topic model with 4 topics.
Upvotes: 4