Reputation: 11
I'm looking to use Mallet to classify different documents by topics that I have defined. I know that Mallet will first determine the topics, then classify the documents but I want to skip the first step because I already have a list of topics with words associated with them. Is there any way to use pre-defined topic lists that I have created to classify documents with Mallet?
Any guidance is appreciated. Thanks!
Upvotes: 1
Views: 167
Reputation: 1683
If you're doing unsupervised learning (without training examples, i.e. docs for each topic), you cannot trivially just set the topics. The point is that the training algorithm does not know anything about the docs in advance. It just tries to separate/distribute them, based on the features you provide.
If you're doing supervised learning, topics are actually classes and you have documents for each class. Then the algorithm tries to learn which features are significant for each class. In mallet you should use the Classification module.
There are probably some fancy topic modelling ideas, which incorporate / skew the topic distributions according to specific keywords, but I don't think that's possible with Mallet.
Upvotes: 1