Reputation: 339
The documentation page for the mean squared error function from sklearn provides some examples on how to use the function. Including on how to use it for multioutput data and for calculating the RMSE. The problem is that this does not work when calculating the RMSE on multiple outputs.
Here is the code I used:
from sklearn.metrics import mean_squared_error
y_true = [[0.5, 1],[-1, 1],[7, -6]]
y_pred = [[0, 2],[-1, 2],[8, -5]]
mean_squared_error(y_true, y_pred) # This returns the MSE
#out: 0.7083333333333334
mean_squared_error(y_true, y_pred, squared=False) # And the RMSE works too
#out: 0.8416254115301732
mean_squared_error(y_true, y_pred, multioutput='raw_values') # I can use the MSE for multiple outputs
#out: array([0.41666667, 1. ])
mean_squared_error(y_true, y_pred, multioutput='raw_values', squared=False) # But not the RMSE
#out: array([0.41666667, 1. ])
# However
import numpy as np
np.sqrt(mean_squared_error(y_true, y_pred, multioutput='raw_values')) # Numpy gives the correct results
#out: array([0.64549722, 1. ])
Some specifications:
Python 3.6.8 (default, Oct 7 2019, 12:59:55)
[GCC 8.3.0] on linux
sklearn.__version__
'0.22'
np.__version__
'1.17.4'
I looked at the source code but I don't see why this does not work.
Upvotes: 1
Views: 3567
Reputation: 62553
This is a known, now closed issue, that does not occur in the current version of sklearn 0.23.2
, as of this answer.
This is not reproducible in numpy 1.19.1 and sklearn 0.23.2
mean_squared_error(y_true, y_pred, multioutput='raw_values', squared=False)
and np.sqrt(mean_squared_error(y_true, y_pred, multioutput='raw_values'))
return the same value.
The resolution is to upgrade.
If upgrading is not an option:
return output_errors
→ return output_errors if squared else np.sqrt(output_errors)
Upvotes: 2