Reputation: 11
data1=[tokens.doc2bow(text) for text in texts]
ldamodel=gensim.models.ldamodel.LdaModel(corpus=data1,id2word=tokens,num_topics=10,random_state=100,update_every=1,chunksize=10,passes=10,alpha='auto',per_word_topics=True)
print(*ldamodel.print_topics(),sep="\n")
lda=ldamodel[data1]
l=[ldamodel.get_document_topics(item) for item in data1]
print(l)
While executing get_document_topics()
, it is giving an output of hundreds of lines (as shown in picture). I don't know what does it mean. I actually want the probabilities of topics. Which method should I use to get the topic probabilities?
Upvotes: 0
Views: 2462
Reputation: 54243
Those are the topic probabilitiies. Your line...
l=[ldamodel.get_document_topics(item) for item in data1]
...essentially says, "give me a list, where each entry in that list is the topic-probabilities for the same entry in data
".
So, the very first item in that returned list...
[(0, 0.974673)]
...means that your very-first document is assigned a 97.4673% chance of being in topic #0.
If you instead want the probabilities for a single document, say the document in slot 6, you'd instead run:
doc_6_topics = ldamodel.get_document_topics(data1[6])
So your existing code is already reporting the per-doc topic probabilities., If your true need is, "How do I get these into another format for another purpose?", you should edit/expand your question with more details about why the existing return value doesn't meet your needs, and what would meet your needs, and what you're trying to do next.
Separate notes:
It'd be better to share raw formatted text of what you're seeing, than screenshots - see some reasons here
It's a little concerning that of the excerpt of output shown – your early documents – they all wind up in topic #0. If in fact your training data is "clumpy", with all related documents in a row, it can be helpful to shuffle them before model training, so that documents of any particular topic might appear anywhere, instead of "all at the front" or "all at the back".
Upvotes: 1