Julia
Julia

Reputation: 1136

Random Forest Regression - How do I analyse its performance? - python, sklearn

I'm struggling to assess the performance of my random forest - I've looked at the mean relative error, but I'm not sure if it's a good indicator. What are some things to check for?

Also, how should I optimise my hyperparameters? I've used rf.score(X_test,y_test) R2, but is that really the only thing I should rely on when doing regressions? I had a look into out of bag scores, but I'm not sure how to interpret them.

May your optima be global and your hyper-parameters optimized :)

from sklearn.ensemble import RandomForestRegressor

rf = RandomForestRegressor(n_estimators = 1000,max_depth=5,random_state = 0)
rf.fit(X_train, y_train);

predictions = rf.predict(X_test)


errors = abs((predictions - y_test)/y_test)
print('Mean Relative Error:', round(np.mean(errors), 2)) 

Upvotes: 12

Views: 24700

Answers (3)

mhellmeier
mhellmeier

Reputation: 2282

In order to extend the answer from Igor Ezersky, there are a lot more functions implemented in scikit learn for model and performance evaluations. A complete list of all scoring parameters are provided in the documentation. Also, some metrics like RMSE and MAPE don't need manual calculations any more (scikit learn version >= 0.24) because they are implemented as library functions. Also, they are much more secure against errors (like zero devisions).

An extended version of the mentioned answer with all currently available metrics for regressions might look like this:

from sklearn import metrics

y_true = [...] # Your real values / test labels
y_pred = [...] # The predictions from your ML / RF model

print('Mean Absolute Error (MAE):', metrics.mean_absolute_error(y_true, y_pred))
print('Mean Squared Error (MSE):', metrics.mean_squared_error(y_true, y_pred))
print('Root Mean Squared Error (RMSE):', metrics.mean_squared_error(y_true, y_pred, squared=False))
print('Mean Absolute Percentage Error (MAPE):', metrics.mean_absolute_percentage_error(y_true, y_pred))
print('Explained Variance Score:', metrics.explained_variance_score(y_true, y_pred))
print('Max Error:', metrics.max_error(y_true, y_pred))
print('Mean Squared Log Error:', metrics.mean_squared_log_error(y_true, y_pred))
print('Median Absolute Error:', metrics.median_absolute_error(y_true, y_pred))
print('R^2:', metrics.r2_score(y_true, y_pred))
print('Mean Poisson Deviance:', metrics.mean_poisson_deviance(y_true, y_pred))
print('Mean Gamma Deviance:', metrics.mean_gamma_deviance(y_true, y_pred))

If you need more information about the respective metrics, have a look in the scikit learn User Guide.

Upvotes: 7

Ihar Yazerski
Ihar Yazerski

Reputation: 121

For regression model (do not confuse with the classifier model) you can evaluate MAE, MSE, MAPE and RMSE from sklearn:

import numpy as np
from sklearn import metrics

print('Mean Absolute Error (MAE):', metrics.mean_absolute_error(gt, pred))
print('Mean Squared Error (MSE):', metrics.mean_squared_error(gt, pred))
print('Root Mean Squared Error (RMSE):', np.sqrt(metrics.mean_squared_error(gt, pred)))
mape = np.mean(np.abs((gt - pred) / np.abs(gt)))
print('Mean Absolute Percentage Error (MAPE):', round(mape * 100, 2))
print('Accuracy:', round(100*(1 - mape), 2))

Upvotes: 12

Francisco Cantero
Francisco Cantero

Reputation: 64

You can also add these two more metrics:

from sklearn.metrics import accuracy_score, confusion_matrix

accuracy_score(my_class_column, my_forest_train_prediction) confusion_matrix(my_test_data, my_prediction_test_forest)

Also the probability for each prediction can be added:

my_classifier_forest.predict_proba(variable 1, variable n)

Upvotes: -3

Related Questions