Marcus
Marcus

Reputation: 339

sklean mean_squared_error ignores the squared argument, with multioutput='raw_values'

The documentation page for the mean squared error function from sklearn provides some examples on how to use the function. Including on how to use it for multioutput data and for calculating the RMSE. The problem is that this does not work when calculating the RMSE on multiple outputs.

Here is the code I used:

from sklearn.metrics import mean_squared_error

y_true = [[0.5, 1],[-1, 1],[7, -6]]
y_pred = [[0, 2],[-1, 2],[8, -5]]

mean_squared_error(y_true, y_pred)  # This returns the MSE
#out: 0.7083333333333334

mean_squared_error(y_true, y_pred, squared=False)  # And the RMSE works too
#out: 0.8416254115301732

mean_squared_error(y_true, y_pred, multioutput='raw_values')  # I can use the MSE for multiple outputs
#out: array([0.41666667, 1.        ])

mean_squared_error(y_true, y_pred, multioutput='raw_values', squared=False)  # But not the RMSE
#out: array([0.41666667, 1.        ])

# However
import numpy as np

np.sqrt(mean_squared_error(y_true, y_pred, multioutput='raw_values'))  # Numpy gives the correct results
#out: array([0.64549722, 1.        ])

Some specifications:

Python 3.6.8 (default, Oct  7 2019, 12:59:55)
[GCC 8.3.0] on linux

sklearn.__version__
'0.22'

np.__version__
'1.17.4'

I looked at the source code but I don't see why this does not work.

Upvotes: 1

Views: 3567

Answers (1)

Trenton McKinney
Trenton McKinney

Reputation: 62553

  • This is a known, now closed issue, that does not occur in the current version of sklearn 0.23.2, as of this answer.

  • This is not reproducible in numpy 1.19.1 and sklearn 0.23.2

  • mean_squared_error(y_true, y_pred, multioutput='raw_values', squared=False) and np.sqrt(mean_squared_error(y_true, y_pred, multioutput='raw_values')) return the same value.

  • The resolution is to upgrade.

  • If upgrading is not an option:

Upvotes: 2

Related Questions