Reputation: 3355
I trained a simple randomforest classifier, then when I test the prediction with the same test input:
rf_clf.predict([[50,0,500,0,20,0,250000,1.5,110,0,0,2]])
rf_clf.predict_proba([[50,0,500,0,20,0,250000,1.5,110,0,0,2]])
The first line returns array([1.])
, whereas the second line returns array([[0.14, 0.86]])
where the prediction is the first float 0.14
right?
How come those two don't match? I'm a bit confused. Thanks.
Upvotes: 2
Views: 1268
Reputation: 1001
predict()
function returns the class to which the feature belongs to and predict_proba()
function returns the probability of the feature belonging to the diffrent output classes.
Example:
Output of predict()
function gives you the result that the feature belongs to class 1 (i.e) array([1.])
Output of predict_proba()
function gives you the probabilities of the feature belonging to each output class array([[0.14, 0.86]]). 14% probability of feature belonging to class 0 and 86% probability of feature belonging to class 1.
Refer Docs: predict()
docs, predict_proba()
docs
Upvotes: 3
Reputation: 570
Take a look at the documentation part of sklearn.ensemble.RandomForestClassifier
, specifically the predict_proba
method.
Returns: ndarray of shape (n_samples, n_classes), or a list of n_outputs. such arrays if n_outputs > 1. The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
The output you're getting (array([[0.14, 0.86]])) is thus a list of the probabilities for each of the classes that are present in your sample, for each sample input. The method predict()
simply predicts one class for each input (so that's why you're getting array([1.]) as return).
Upvotes: 2