Reputation: 1651
I ran this simple naive bayes program:
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
Y = np.array([1, 1, 1, 2, 2, 2])
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(X, Y)
print(clf.predict([[-0.8, -1],[-0.9, -1]]))
and the result I got is:
[1 1]
The [-0.8, -1]
is classified to 1, and the [-0.9, -1]
is classified to 2.
If I know my data all came from the same class, i.e., [[-0.8, -1],[-0.9, -1]]
came from the same class, is there a way for scikit-learn's naive bayes classifier to classify this data as a whole (and give me [1] as a result in this case), rather than classifying every data point individually.
Upvotes: 0
Views: 1206
Reputation: 19159
The naive Bayes classifier classifies each input individually (not as a group). If you know that all of the inputs belong to the same (but unknown) class, then you need to do some additional work to get your result. One option is to select the class with the greatest count in the result from clf.predict
but that might not work well if you are only have two instances in the group.
Another option would be to call predict_proba for the GaussianNB
classifier, which will return the probabilities of all classes for each of the inputs. You can then use the individual probabilities (e.g., you could just sum them for each class) to decide how you want to classify the group.
You could even combine the two approaches - Use predict
and select the class with the highest count but use predict_proba
to break a tie.
Upvotes: 3