Reputation: 2893
I'm using sklearn
's Random Forest classifier for a model I've built. When using it for predictions, I was wondering if there's a way to get the certainty level of the predictions (i.e. the number of trees that predicted that class)?
Upvotes: 2
Views: 6153
Reputation: 2893
Apparently there's a built in method for this in the RandomForestClassifier
:
forest.predict_proba(X)
Upvotes: 4
Reputation: 239
There is no direct way to do this. You will have to take each one of the trees out of the forest and make (single-tree) predictions and then count how many gave the same answer with the Forest .
This is an example :
import numpy as np
from sklearn.ensemble import RandomForestClassifier
#modelling data
X=np.array([[1,2,3,4],[1,3,1,2],[4,6,1,2], [3,3,4,3] , [1,1,2,1] ])
#target variable
y=np.array([1,0,1,1,0])
#random_forest model
forest = RandomForestClassifier(n_estimators=10, random_state=1)
#fit forest model
forest = forest.fit( X, y )
#predict .
full_predictions=forest.predict( X )
print (full_predictions)
#[1 0 1 1 0]
#initialize a vector to hold counts of trees that gave the same class as in full_predictions. Has the same length as rows in the data
counts_of_same_predictions=[0 for i in range (len(y)) ]
#access each one of the trees and make a prediction and then count whether it was the same as the one with the Random Forest
i_tree = 0
for tree_in_forest in forest.estimators_:
single_tree_predictions=tree_in_forest.predict(X)
#check if predictions are the same with the global (forest's) predictions
for j in range (len(single_tree_predictions)):
if single_tree_predictions[j]==full_predictions[j]:
#increment counts for that row
counts_of_same_predictions[j]+=1
print('counts of same classifications', counts_of_same_predictions)
#counts of same classifications [6, 7, 8, 8, 8]
Upvotes: 2