Abrar Hossain
Abrar Hossain

Reputation: 109

ValueError: 'mean_squared_error' is not a valid scoring value

So, I have been working on my first ML project and as part of that I have been trying out various models from sci-kit learn and I wrote this piece of code for a random forest model:

#Random Forest
reg = RandomForestRegressor(random_state=0, criterion = 'mse')
#Apply grid search for best parameters
params = {'randomforestregressor__n_estimators' : range(100, 500, 200),
          'randomforestregressor__min_samples_split' : range(2, 10, 3)}
pipe = make_pipeline(reg)
grid = GridSearchCV(pipe, param_grid = params, scoring='mean_squared_error', n_jobs=-1, iid=False, cv=5)
reg = grid.fit(X_train, y_train)
print('Best MSE: ', grid.best_score_)
print('Best Parameters: ', grid.best_estimator_)

y_train_pred = reg.predict(X_train)
y_test_pred = reg.predict(X_test)
tr_err = mean_squared_error(y_train_pred, y_train)
ts_err = mean_squared_error(y_test_pred, y_test)
print(tr_err, ts_err)
results_train['random_forest'] = tr_err
results_test['random_forest'] = ts_err

But, when I run this code, I get the following error:

KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\sklearn\metrics\_scorer.py in get_scorer(scoring)
    359             else:
--> 360                 scorer = SCORERS[scoring]
    361         except KeyError:

KeyError: 'mean_squared_error'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-149-394cd9e0c273> in <module>
      5 pipe = make_pipeline(reg)
      6 grid = GridSearchCV(pipe, param_grid = params, scoring='mean_squared_error', n_jobs=-1, iid=False, cv=5)
----> 7 reg = grid.fit(X_train, y_train)
      8 print('Best MSE: ', grid.best_score_)
      9 print('Best Parameters: ', grid.best_estimator_)

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     71                           FutureWarning)
     72         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 73         return f(**kwargs)
     74     return inner_f
     75 

~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
    652         cv = check_cv(self.cv, y, classifier=is_classifier(estimator))
    653 
--> 654         scorers, self.multimetric_ = _check_multimetric_scoring(
    655             self.estimator, scoring=self.scoring)
    656 

~\anaconda3\lib\site-packages\sklearn\metrics\_scorer.py in _check_multimetric_scoring(estimator, scoring)
    473     if callable(scoring) or scoring is None or isinstance(scoring,
    474                                                           str):
--> 475         scorers = {"score": check_scoring(estimator, scoring=scoring)}
    476         return scorers, False
    477     else:

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     71                           FutureWarning)
     72         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 73         return f(**kwargs)
     74     return inner_f
     75 

~\anaconda3\lib\site-packages\sklearn\metrics\_scorer.py in check_scoring(estimator, scoring, allow_none)
    403                         "'fit' method, %r was passed" % estimator)
    404     if isinstance(scoring, str):
--> 405         return get_scorer(scoring)
    406     elif callable(scoring):
    407         # Heuristic to ensure user has not passed a metric

~\anaconda3\lib\site-packages\sklearn\metrics\_scorer.py in get_scorer(scoring)
    360                 scorer = SCORERS[scoring]
    361         except KeyError:
--> 362             raise ValueError('%r is not a valid scoring value. '
    363                              'Use sorted(sklearn.metrics.SCORERS.keys()) '
    364                              'to get valid options.' % scoring)

ValueError: 'mean_squared_error' is not a valid scoring value. Use sorted(sklearn.metrics.SCORERS.keys()) to get valid options.

So, I tried running it by removing the scoring='mean_squared_error' from GridSearchCV(pipe, param_grid = params, scoring='mean_squared_error', n_jobs=-1, iid=False, cv=5). When I do that, the code runs perfectly and gives a decent enough training and testing error.

Regardless of that, I can't figure out why with scoring='mean_squared_error' parameter in GridSearchCV function throws me that error. What am I doing wrong?

Upvotes: 3

Views: 16732

Answers (1)

afsharov
afsharov

Reputation: 5164

According to the documentation:

All scorer objects follow the convention that higher return values are better than lower return values. Thus metrics which measure the distance between the model and the data, like metrics.mean_squared_error, are available as neg_mean_squared_error which return the negated value of the metric.

This means that you have to pass scoring='neg_mean_squared_error' in order to evaluate the grid search results with Mean Squared Error.

Upvotes: 12

Related Questions