wawawa
wawawa

Reputation: 3355

Why prediction from randomforest classifier .predict() and .predict_proba() don't match?

I trained a simple randomforest classifier, then when I test the prediction with the same test input:

rf_clf.predict([[50,0,500,0,20,0,250000,1.5,110,0,0,2]])

rf_clf.predict_proba([[50,0,500,0,20,0,250000,1.5,110,0,0,2]])

The first line returns array([1.]), whereas the second line returns array([[0.14, 0.86]]) where the prediction is the first float 0.14 right?

How come those two don't match? I'm a bit confused. Thanks.

Upvotes: 2

Views: 1268

Answers (2)

Kishore Sampath
Kishore Sampath

Reputation: 1001

predict() function returns the class to which the feature belongs to and predict_proba() function returns the probability of the feature belonging to the diffrent output classes.

Example: Output of predict() function gives you the result that the feature belongs to class 1 (i.e) array([1.])

Output of predict_proba() function gives you the probabilities of the feature belonging to each output class array([[0.14, 0.86]]). 14% probability of feature belonging to class 0 and 86% probability of feature belonging to class 1.

Refer Docs: predict() docs, predict_proba() docs

Upvotes: 3

sandertjuh
sandertjuh

Reputation: 570

Take a look at the documentation part of sklearn.ensemble.RandomForestClassifier, specifically the predict_proba method.

Returns: ndarray of shape (n_samples, n_classes), or a list of n_outputs. such arrays if n_outputs > 1. The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

The output you're getting (array([[0.14, 0.86]])) is thus a list of the probabilities for each of the classes that are present in your sample, for each sample input. The method predict() simply predicts one class for each input (so that's why you're getting array([1.]) as return).

Upvotes: 2

Related Questions