Prashanth
Prashanth

Reputation: 109

Confidence vs Probability in Random Forest Algorithm

I've been trying to run the Random Forest classifier using scikit-learn. I'd like to understand the difference between probability and confidence. Let's assume we have 5 classes A,B,C,D,E . Now if I run predict_proba() and get a match for class A ,is the probability returned the probability of it being class A among the 5 classes ? Which means if its 0.95 probability of class A then the remaining 0.05 is shared for the remaining classes? If that's the case , I'd like to understand if there's a way to get a confidence level for a prediction , meaning how confident is the classifier that it predicted class A with 0.95 probability? is there such a mechanism?

The reason I'd like to understand this is because assume i throw in classification data which doesn't belong to any of the 5 classes , i'd like to throw out that it doesn't belong to any of these 5 classes. I feel the classifier currently would try to fit it into one of the 5 classes and could possibly return a high probability? even though its not confident about it?

Upvotes: 3

Views: 4413

Answers (1)

Dirk Nachbar
Dirk Nachbar

Reputation: 522

The probabilities are not confidence intervals.

To add confidence intervals you need to use this extension http://contrib.scikit-learn.org/forest-confidence-interval/

Upvotes: 1

Related Questions