Reputation: 1230
I have trained LDA model using gensim on a text_corpus.
>lda_model = gensim.models.ldamodel.LdaModel(text_corpus, 10)
Now if a new text document text_sparse_vector has to be inferred I have to do
>lda_model[text_sparse_vector]
[(0, 0.036479568280206563), (3, 0.053828073308160099), (7, 0.021936618544365804), (11, 0.017499953446152686), (15, 0.010153090454090822), (16, 0.35967516223499041), (19, 0.098570351997275749), (26, 0.068550060242800928), (27, 0.08371562828754453), (28, 0.14110945630261607), (29, 0.089938130046832571)]
But how do I get the word distribution for each of the corresponding topics. For example, How do I know top 20 words for topic number 16 ?
The class gensim.models.ldamodel.LdaModel has method called show_topics(topics=10, topn=10, log=False, formatted=True), but the as the documentation says it shows randomly selected list of topics.
Is there a way to link or print I can map the inferred topic numbers to word distributions ?
Upvotes: 1
Views: 4603
Reputation: 1212
The last line here will change the number of words per topic. Hope this helps :)
# train and save LDA model
lda_model = gensim.models.LdaMulticore(bow_corpus, num_topics=20, id2word=dictionary, passes=2, workers=2, chunksize=400000)
# check out the topics
for idx, topic in lda_model.print_topics(-1):
print('Topic: {} \nWords: {}'.format(idx, topic))
# swap out '30' for any number and this line will give you that many words per topic :)
lda_model.print_topics(idx, 30)
Upvotes: 0
Reputation: 383
Or if you have K
topics, then:
print(str(["Topic #"+str(k)+":\n" + str(lda.show_topic(k,topn=20)) for k in range(K)]))
will get you ugly, but consistently sorted output.
Upvotes: 0
Reputation: 157
lda.print_topic(x, topn=20)
will get you the top 20 features for topic x
Upvotes: 6