Reputation: 7935
Most of the examples I see with NaiveBayesClassifier are just two: 'pos', 'neg'. I want to get to the topic of the text, like entertainment, sports, movies, politics, literature. It is possible to train NaiveBayesClassifier for this, or should I be looking somewhere else?
Upvotes: 1
Views: 2669
Reputation: 43817
Sure it is. When you pass the training set into the NaiveBayesClassifier.train
method it will create a Bayes model for each label in the training set. If your training set has multiple labels then your classifier will classify into multiple labels. If your training set only has 2 labels then your classifier will only give two classifications. When you ask the classifier to classify it will return the model that has the highest probability given the feature set.
In a Bayes classifier a probability model is created for each label. The model that matches the features best is chosen. Here is a made up example:
import nltk
articles = [({'entertaining':0.6, 'informative':0.2, 'statistical':0.6}, 'sports'),
({'entertaining':0.7, 'informative':0.2, 'statistical':0.8}, 'sports'),
({'entertaining':0.1, 'informative':0.7, 'statistical':0.2}, 'news'),
({'entertaining':0.2, 'informative':0.8, 'statistical':0.3}, 'news'),
({'entertaining':0.8, 'informative':0.2, 'statistical':0.1}, 'movies')]
classifier = nltk.NaiveBayesClassifier.train(articles)
label = classifier.classify({'entertaining':0.9, 'informative':0.2, 'statistical':0.1})
print label
#movies
probabilities = classifier.prob_classify({'entertaining':0.9, 'informative':0.2, 'statistical':0.1})
for sample in probabilities.samples():
print "{0}: {1}".format(sample, probabilities.prob(sample))
#news: 0.0580
#sports: 0.2999
#movies: 0.6522
Upvotes: 8