darth vader
darth vader

Reputation: 13

Words appearing across all topics in lda

I am using gensim lda for topic modeling and getting the results like so:

Topic 1: word1 word2 word3 word4

Topic 2: word4 word1 word2 word5

Topic 3: word1 word4 word5 word6

However using mallet on same lda does not produce duplicate words across topics. I have ~20 documents with >1000 words each that I train the lda on. How to get rid of words appearing across multiple topics?

Upvotes: 1

Views: 1167

Answers (1)

WolfgangK
WolfgangK

Reputation: 993

In LDA all words are part of all topics, but with a different probability. You could define a minimum probability for your words to print, but I would be very surprised if mallet didn't come up with at least a couple of "duplicate" words across topics as well. Make sure to use the same parameters for both gensim and mallet.

Upvotes: 0

Related Questions