Zhendong Cao
Zhendong Cao

Reputation: 169

How to calculate MSE criteria in RandomForestRegression?

I'm now using RandomForestRegressor from sklearn.ensemble to analyze a dataset and I select "mse" as the function to measure the quality of a split. But I'm not quite clear how the mse is calculated. Could anyone explain it to me here (better with equations) or provide me some references on that? Thank you in advance.

Upvotes: 2

Views: 1839

Answers (1)

Mario
Mario

Reputation: 1966

Short answer:

Mean Squared Error (MSE) is calculated by squaring all of the errors (to make them positive) and then taking the mean value of those squares. It is a single value; "for each tree, we get a difference between two MSE values. Averaging over trees gives the mean difference between the two MSE values". ref

... However, the Random Forest calculates the MSE using the predictions obtained from evaluating the same data. Train in every tree but only considering the data is not taken from bootstrapping to construct the tree, wether the data that it is in the OOB (OUT-OF-BAG). Then it averages the predictions for all the OOB predictions for each sample of data. ... ref

MSE, metric is one of the cost function methods. Consider that your model green line is in the following picture, and those blue points are data (observations). MSE, as its name suggests, is the mean summation of square areas of all data points with respect to a line, which all in all represents your model errors.

img

MSE can be calculated by:

img

It shows how good or bad the model is. The smaller MSE, the better model!

More info:

Understanding Regression Error Metrics in Python

Introduction to Loss Functions

Update 30.05.2019: To verify things, you can dig into the documentation and sometimes in codes as well, based on its documentation RandomForestRegressor() , MSE is nothing else than variance reduction as feature selection criterion even when you check the source code it is used for measuring the quality of a split. On the other hand, if you are doubtful instead of the MSE approach in RandomForestRegressor(), you can use it independently by customizing criterion like this:

from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor
#Feature Selection
criterion = mean_squared_error(y, predictions)
RandomForestRegressor( ...,criterion= criterion,...)

or using :

import numpy as np
criterion = np.mean((y_test - est.predict(X_test))**2)

More info

Upvotes: 2

Related Questions