sasha
sasha

Reputation: 25

How does LDA (Latent Dirichlet Allocation) inference from `gensim` work for a new data?

I am training my ldamodel using gensim, and predicting using a test corpus like this ldamodel[doc_term_matrix_test], it works just fine but I don't understand how the prediction is actually done using the trained model (what ldamodel[doc_term_matrix_test] is doing).

Here is the code :

dictionary2 = corpora.Dictionary(test)
dictionary = corpora.Dictionary(train)
dictionary.merge_with(dictionary2)
doc_term_matrix2 = [dictionary.doc2bow(doc) for doc in test]
doc_term_matrix = [dictionary.doc2bow(doc) for doc in train]
Lda = gensim.models.ldamodel.LdaModel
ldamodel = Lda(doc_term_matrix, num_topics=2, id2word = 
dictionary,random_state=100, iterations=50, passes=1)
topics = sorted(ldamodel[doc_term_matrix2],
                key=lambda 
                x:x[1],
                reverse=True)

Upvotes: 2

Views: 2089

Answers (1)

Inon Peled
Inon Peled

Reputation: 711

To quote from gensim docs about ldamodel:

This module allows both LDA model estimation from a training corpus and inference of topic distribution on new, unseen documents.

So apparently, what your code does is not quite "prediction" but rather inference. That is, your trained LDA model yields for every test document T an estimation of the topic distribution of T.

Upvotes: 2

Related Questions