Reputation: 109
I've been trying to run the Random Forest classifier using scikit-learn
. I'd like to understand the difference between probability and confidence. Let's assume we have 5 classes A,B,C,D,E . Now if I run predict_proba()
and get a match for class A ,is the probability returned the probability of it being class A among the 5 classes ? Which means if its 0.95 probability of class A then the remaining 0.05 is shared for the remaining classes? If that's the case , I'd like to understand if there's a way to get a confidence level for a prediction , meaning how confident is the classifier that it predicted class A with 0.95 probability? is there such a mechanism?
The reason I'd like to understand this is because assume i throw in classification data which doesn't belong to any of the 5 classes , i'd like to throw out that it doesn't belong to any of these 5 classes. I feel the classifier currently would try to fit it into one of the 5 classes and could possibly return a high probability? even though its not confident about it?
Upvotes: 3
Views: 4413
Reputation: 522
The probabilities are not confidence intervals.
To add confidence intervals you need to use this extension http://contrib.scikit-learn.org/forest-confidence-interval/
Upvotes: 1