How does LDA (Latent Dirichlet Allocation) inference from `gensim` work for a new data?

Question

I am training my ldamodel using gensim, and predicting using a test corpus like this ldamodel[doc_term_matrix_test], it works just fine but I don't understand how the prediction is actually done using the trained model (what ldamodel[doc_term_matrix_test] is doing).

Here is the code :

dictionary2 = corpora.Dictionary(test)
dictionary = corpora.Dictionary(train)
dictionary.merge_with(dictionary2)
doc_term_matrix2 = [dictionary.doc2bow(doc) for doc in test]
doc_term_matrix = [dictionary.doc2bow(doc) for doc in train]
Lda = gensim.models.ldamodel.LdaModel
ldamodel = Lda(doc_term_matrix, num_topics=2, id2word = 
dictionary,random_state=100, iterations=50, passes=1)
topics = sorted(ldamodel[doc_term_matrix2],
                key=lambda 
                x:x[1],
                reverse=True)

Inon Peled · Accepted Answer

To quote from gensim docs about ldamodel:

This module allows both LDA model estimation from a training corpus and inference of topic distribution on new, unseen documents.

So apparently, what your code does is not quite "prediction" but rather inference. That is, your trained LDA model yields for every test document T an estimation of the topic distribution of T.

How does LDA (Latent Dirichlet Allocation) inference from `gensim` work for a new data?

Answers (1)

Related Questions