Reputation: 9637
In sci-kit learn's naive bayesian classifiers you can specify the prior probabilities, and the classifier will use those provided probabilities in it's calculations. But I don't know how the prior probabilities should be ordered.
from sklearn.naive_bayes import BernoulliNB
data = [[0], [1]]
classes = ['light bulb', 'door mat']
classes.shuffle() # This simulates getting classes from a complex source.
classifier = BernoulliNB(class_prior=[0, 1]) # Here we provide prior probabilities.
classifier.fit(data, classes)
In the above code, how do I know which class is assumed to be the 100% prior? Do I need to consider the order of the classes in the data before specifying prior probabilities?
I would also be interested in knowing where this documented.
Upvotes: 5
Views: 2539
Reputation: 32502
Deeply nested within the code base the following happens: The classes you provide samplewise to the call of fit()
are turned into a set, sorted and then stored in that order in the classifier object (alphabetical or numerical order). The priors provided for __init__()
correspond to the classes in this exact order.
Apparantly this is undocumented.
For further reading:
Upvotes: 1
Reputation: 48297
It seems to be undocumented. When fit, target is preprocessed by LabelBinarizer
, so you can get your data's classes with
from sklearn.preprocessing import LabelBinarizer
labelbin = LabelBinarizer()
labelbin.fit_transform(classes)
Then labelbin.classes_
contains resulting classes for your target data (classes
), in order corresponding to one of priors.
Upvotes: 5
Reputation: 363507
The order is that of classes
after sorting, so P(light bulb
)=.4 would be specified using [.6, .4]
because "door mat" < "light bulb"
.
Upvotes: 2