Kinjal Dixit
Kinjal Dixit

Reputation: 7935

Can I use NaiveBayesClassifier to classify more than two classifications?

Most of the examples I see with NaiveBayesClassifier are just two: 'pos', 'neg'. I want to get to the topic of the text, like entertainment, sports, movies, politics, literature. It is possible to train NaiveBayesClassifier for this, or should I be looking somewhere else?

Upvotes: 1

Views: 2669

Answers (1)

Pace
Pace

Reputation: 43817

Sure it is. When you pass the training set into the NaiveBayesClassifier.train method it will create a Bayes model for each label in the training set. If your training set has multiple labels then your classifier will classify into multiple labels. If your training set only has 2 labels then your classifier will only give two classifications. When you ask the classifier to classify it will return the model that has the highest probability given the feature set.

In a Bayes classifier a probability model is created for each label. The model that matches the features best is chosen. Here is a made up example:

import nltk

articles = [({'entertaining':0.6, 'informative':0.2, 'statistical':0.6}, 'sports'),
            ({'entertaining':0.7, 'informative':0.2, 'statistical':0.8}, 'sports'),
            ({'entertaining':0.1, 'informative':0.7, 'statistical':0.2}, 'news'),
            ({'entertaining':0.2, 'informative':0.8, 'statistical':0.3}, 'news'),
            ({'entertaining':0.8, 'informative':0.2, 'statistical':0.1}, 'movies')]

classifier = nltk.NaiveBayesClassifier.train(articles)

label = classifier.classify({'entertaining':0.9, 'informative':0.2, 'statistical':0.1})

print label
#movies    

probabilities = classifier.prob_classify({'entertaining':0.9, 'informative':0.2, 'statistical':0.1})

for sample in probabilities.samples():
    print "{0}: {1}".format(sample, probabilities.prob(sample))
#news:   0.0580
#sports: 0.2999
#movies: 0.6522

Upvotes: 8

Related Questions