Reputation: 5494
I am interested in understanding how probability estimates are calculated by random forests, both in general and specifically in Python's scikit-learn library (where probability estimated are returned by the predict_proba function).
Thanks, Guy
Upvotes: 7
Views: 5068
Reputation: 33950
In addition to what Andreas/Dougal said,
when you train the RF, turn on compute_importances=True.
Then inspect classifier.feature_importances_
to see which features are occurring high-up in the RF's trees.
Upvotes: 2
Reputation: 28768
The probabilities returned by a forest are the mean probabilities returned by the trees in the ensemble (docs). The probabilities returned by a single tree are the normalized class histograms of the leaf a sample lands in.
Upvotes: 13