Reputation: 1221
I've trained a Random Forest (regressor in this case) model using scikit learn (python), and I'would like to plot the error rate on a validation set based on the numeber of estimators used. In other words, there's a way to predict using only a portion of the estimators in your RandomForestRegressor?
Using predict(X) will give you the predictions based on the mean of every single tree results. There is a way to limit the usage of the trees? Or eventually, get each single output for each single tree in the forest?
Upvotes: 0
Views: 1537
Reputation: 760
Once trained, you can access these via the estimators_
attribute of the random forest object.
Upvotes: 1
Reputation: 1221
Thanks to cohoz I've figured out how to do it. I've written a couple of def, which turned out to be handy while plotting the learning curve of the random forest regressor on the test set.
## Error metric
import numpy as np
def rmse(train,test):
return np.sqrt(np.mean(pow(test - train+,2)))
## Print test set error
## Input the RandomForestRegressor, test set feature and test set known values
def rfErrCurve(rf_model,test_X,test_y):
p = []
for i,tree in enumerate(rf_model.estimators_):
p.insert(i,tree.predict(test_X))
print rmse(np.mean(p,axis=0),test_y)
Upvotes: 2