kbg
kbg

Reputation: 245

Output of predict_proba in scikit-learn

Suppose I have a data sample having two classes labeled 0 and 1. When I run output = clf.predict_proba(X_input), each row in output consists of 2 columns corresponding to probability of each class.

Does the first column represent probability of class 0 or 1? The predict_proba method of GradientBoostingClassier says:

"The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_."

Does that mean that whichever, 0 or 1, is the first element of the data sample corresponds to the first column in the output of predict_proba?

Upvotes: 6

Views: 5077

Answers (1)

Grr
Grr

Reputation: 16109

Generally a classifier will have an attribute named classes_ this will be populated upon fitting and store the classes. The order of the predict_proba method output will be the same as the order in this attribute.

For example:

nb = MultinomialNM()
nb.fit(some_gender_data)
nb.classes_
array(['F', 'M'], dtype='<U1')

As far as I know all of the classifiers in sklearn have this attribute once fit.

Upvotes: 6

Related Questions