Reputation: 13
I am using gensim lda for topic modeling and getting the results like so:
Topic 1: word1 word2 word3 word4
Topic 2: word4 word1 word2 word5
Topic 3: word1 word4 word5 word6
However using mallet on same lda does not produce duplicate words across topics. I have ~20 documents with >1000 words each that I train the lda on. How to get rid of words appearing across multiple topics?
Upvotes: 1
Views: 1167
Reputation: 993
In LDA all words are part of all topics, but with a different probability. You could define a minimum probability for your words to print, but I would be very surprised if mallet didn't come up with at least a couple of "duplicate" words across topics as well. Make sure to use the same parameters for both gensim and mallet.
Upvotes: 0