Reputation: 122260
The lda.show_topics
module from the following code only prints the distribution of the top 10 words for each topic, how do i print out the full distribution of all the words in the corpus?
from gensim import corpora, models
documents = ["Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management system",
"System and human system engineering testing of EPS",
"Relation of user perceived response time to error measurement",
"The generation of random binary unordered trees",
"The intersection graph of paths in trees",
"Graph minors IV Widths of trees and well quasi ordering",
"Graph minors A survey"]
stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
for document in documents]
dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
lda = models.ldamodel.LdaModel(corpus_tfidf, id2word=dictionary, num_topics=2)
for i in lda.show_topics():
print i
Upvotes: 8
Views: 7890
Reputation: 1831
The below code will print your words as well as their probability. I have printed top 10 words. You can change num_words = 10 to print more words per topic.
for words in lda.show_topics(formatted=False,num_words=10):
print(words[0])
print("******************************")
for word_prob in words[1]:
print("(",dictionary[int(word_prob[0])],",",word_prob[1],")",end = "")
print("")
print("******************************")
Upvotes: 0
Reputation: 61
There are two variable call num_topics
and num_words
in show_topics()
,for num_topics
number of topics, return num_words
most significant words (10 words per topic, by default). see http://radimrehurek.com/gensim/models/ldamodel.html#gensim.models.ldamodel.LdaModel.show_topics
So you can use the len(lda.id2word)
for the full words distributions for each topic,and the lda.num_topics
for the all topics in your lda model.
for i in lda.show_topics(formatted=False,num_topics=lda.num_topics,num_words=len(lda.id2word)):
print i
Upvotes: 4
Reputation: 122260
There is a variable call topn
in show_topics()
where you can specify the number of top N words you require from the words distribution over each topic. see http://radimrehurek.com/gensim/models/ldamodel.html
So instead of the default lda.show_topics()
. You can use the len(dictionary)
for the full word distributions for each topic:
for i in lda.show_topics(topn=len(dictionary)):
print i
Upvotes: 8