Reputation: 1136
I'm struggling to assess the performance of my random forest - I've looked at the mean relative error, but I'm not sure if it's a good indicator. What are some things to check for?
Also, how should I optimise my hyperparameters?
I've used rf.score(X_test,y_test)
R2, but is that really the only thing I should rely on when doing regressions? I had a look into out of bag scores, but I'm not sure how to interpret them.
May your optima be global and your hyper-parameters optimized :)
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators = 1000,max_depth=5,random_state = 0)
rf.fit(X_train, y_train);
predictions = rf.predict(X_test)
errors = abs((predictions - y_test)/y_test)
print('Mean Relative Error:', round(np.mean(errors), 2))
Upvotes: 12
Views: 24700
Reputation: 2282
In order to extend the answer from Igor Ezersky, there are a lot more functions implemented in scikit learn for model and performance evaluations. A complete list of all scoring parameters are provided in the documentation. Also, some metrics like RMSE and MAPE don't need manual calculations any more (scikit learn version >= 0.24) because they are implemented as library functions. Also, they are much more secure against errors (like zero devisions).
An extended version of the mentioned answer with all currently available metrics for regressions might look like this:
from sklearn import metrics
y_true = [...] # Your real values / test labels
y_pred = [...] # The predictions from your ML / RF model
print('Mean Absolute Error (MAE):', metrics.mean_absolute_error(y_true, y_pred))
print('Mean Squared Error (MSE):', metrics.mean_squared_error(y_true, y_pred))
print('Root Mean Squared Error (RMSE):', metrics.mean_squared_error(y_true, y_pred, squared=False))
print('Mean Absolute Percentage Error (MAPE):', metrics.mean_absolute_percentage_error(y_true, y_pred))
print('Explained Variance Score:', metrics.explained_variance_score(y_true, y_pred))
print('Max Error:', metrics.max_error(y_true, y_pred))
print('Mean Squared Log Error:', metrics.mean_squared_log_error(y_true, y_pred))
print('Median Absolute Error:', metrics.median_absolute_error(y_true, y_pred))
print('R^2:', metrics.r2_score(y_true, y_pred))
print('Mean Poisson Deviance:', metrics.mean_poisson_deviance(y_true, y_pred))
print('Mean Gamma Deviance:', metrics.mean_gamma_deviance(y_true, y_pred))
If you need more information about the respective metrics, have a look in the scikit learn User Guide.
Upvotes: 7
Reputation: 121
For regression model (do not confuse with the classifier model) you can evaluate MAE, MSE, MAPE and RMSE from sklearn
:
import numpy as np
from sklearn import metrics
print('Mean Absolute Error (MAE):', metrics.mean_absolute_error(gt, pred))
print('Mean Squared Error (MSE):', metrics.mean_squared_error(gt, pred))
print('Root Mean Squared Error (RMSE):', np.sqrt(metrics.mean_squared_error(gt, pred)))
mape = np.mean(np.abs((gt - pred) / np.abs(gt)))
print('Mean Absolute Percentage Error (MAPE):', round(mape * 100, 2))
print('Accuracy:', round(100*(1 - mape), 2))
Upvotes: 12
Reputation: 64
You can also add these two more metrics:
from sklearn.metrics import accuracy_score, confusion_matrix
accuracy_score(my_class_column, my_forest_train_prediction) confusion_matrix(my_test_data, my_prediction_test_forest)
Also the probability for each prediction can be added:
my_classifier_forest.predict_proba(variable 1, variable n)
Upvotes: -3