Reputation: 17
I am new to topic modeling and kind of confused. I have run MALLET various times with different values for the number of topics. So how do I know which one to choose for further analysis? I know that there are papers out there dealing with evaluation of topic models, but I can't code something like this.
Upvotes: 0
Views: 84
Reputation: 1901
Don't think of the number of topics as a natural characteristic of your documents. They aren't really combinations of multinomial distributions, so there is no "right" answer. There is a wide range of good values.
You should think of the number of topics as the scale of a map of your collection. If you want a broad overview, use fewer topics. If you want more detail, use more. The right number is the value that produces meaningful results that allow you to accomplish your goal.
Upvotes: 1