Economist_Ayahuasca
Economist_Ayahuasca

Reputation: 1642

infer topic distributions on new, unseen documents with LDA and Gensim

Suppose I have generated a latent Dirichlet allocation model of Corpus1 using the basic command:

ldamodel = gensim.models.ldamodel.LdaModel(corpus1, num_topics=25, id2word = dictionary, passes=50, minimum_probability=0)

My question would be, how can I classify the new documents from say `Corpus2'?

I am trying to use the following command print(ldamodel[Corpus2[1]]) to obtain the distribution for the first document but I get the following error:

ValueError: not enough values to unpack (expected 2, got 1)

I am very confused regarding the class that the object Corpus2 should be. Any suggestions of where to find more information or a tutorial is more than welcome

Upvotes: 1

Views: 823

Answers (1)

shishir sheshadri
shishir sheshadri

Reputation: 26

I had faced a similar issue. Ensure that corpus2 has the same representation as corpus1. By the looks of it, I'm guessing Corpus2[1] is a list of words appearing in a document. Vectorize the same. Perform a tf-idf transformation and then feed it to the model. That way, it has two elements. (word_id, tf-idf value)

Upvotes: 1

Related Questions