Reputation: 13733
I'm having a hard time figuring out parameter return_train_score
in GridSearchCV
. From the docs:
return_train_score
: boolean, optionalIf
False
, thecv_results_
attribute will not include training scores.
My question is: what are the training scores?
In the following code I'm splitting data into ten stratified folds. As a consequence grid.cv_results_
contains ten test scores, namely 'split0_test_score'
, 'split1_test_score'
, ..., 'split9_test_score'
. I'm aware that each of those is the success rate obtained by a 5-nearest neighbors classifier that uses the corresponding fold for testing and the remaining nine folds for training.
grid.cv_results_
also contains ten train scores: 'split0_train_score'
, 'split1_train_score'
, ..., 'split9_train_score'
. How are these values calculated?
from sklearn import datasets
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import StratifiedKFold
X, y = datasets.load_iris(True)
skf = StratifiedKFold(n_splits=10, random_state=0)
knn = KNeighborsClassifier()
grid = GridSearchCV(estimator=knn,
cv=skf,
param_grid={'n_neighbors': [5]},
return_train_score=True)
grid.fit(X, y)
print('Mean test score: {}'.format(grid.cv_results_['mean_test_score']))
print('Mean train score: {}'.format(grid.cv_results_['mean_train_score']))
#Mean test score: [ 0.96666667]
#Mean train score: [ 0.96888889]
Upvotes: 6
Views: 11540
Reputation: 36599
Maybe my other answer here will give you clear understanding of working in grid-search.
Essentially training scores are the score of model on the same data on which its trained on.
In each fold split, data will be divided into two parts: train and test. Train data will be used to fit() the internal estimator and test data will be used to check the performance of that. training score is just to check how well the model fit the training data.
Upvotes: 2
Reputation: 4150
It is the train score of the prediction model on all folds excluding the one you are testing on. In your case, it is the score over the 9 folds you trained the model on.
Upvotes: 4