Buttons840
Buttons840

Reputation: 9637

How do I know what prior's I'm giving to sci-kit learn? (Naive-bayes classifiers.)

In sci-kit learn's naive bayesian classifiers you can specify the prior probabilities, and the classifier will use those provided probabilities in it's calculations. But I don't know how the prior probabilities should be ordered.

from sklearn.naive_bayes import BernoulliNB
data = [[0], [1]]
classes = ['light bulb', 'door mat']
classes.shuffle()  # This simulates getting classes from a complex source.
classifier = BernoulliNB(class_prior=[0, 1])  # Here we provide prior probabilities.
classifier.fit(data, classes)

In the above code, how do I know which class is assumed to be the 100% prior? Do I need to consider the order of the classes in the data before specifying prior probabilities?

I would also be interested in knowing where this documented.

Upvotes: 5

Views: 2539

Answers (3)

moooeeeep
moooeeeep

Reputation: 32502

Deeply nested within the code base the following happens: The classes you provide samplewise to the call of fit() are turned into a set, sorted and then stored in that order in the classifier object (alphabetical or numerical order). The priors provided for __init__() correspond to the classes in this exact order.

Apparantly this is undocumented.

For further reading:

Upvotes: 1

alko
alko

Reputation: 48297

It seems to be undocumented. When fit, target is preprocessed by LabelBinarizer, so you can get your data's classes with

from sklearn.preprocessing import LabelBinarizer
labelbin = LabelBinarizer()
labelbin.fit_transform(classes)

Then labelbin.classes_ contains resulting classes for your target data (classes), in order corresponding to one of priors.

Upvotes: 5

Fred Foo
Fred Foo

Reputation: 363507

The order is that of classes after sorting, so P(light bulb)=.4 would be specified using [.6, .4] because "door mat" < "light bulb".

Upvotes: 2

Related Questions