How does GridSearchCV compute training scores?

Question

I'm having a hard time figuring out parameter return_train_score in GridSearchCV. From the docs:

return_train_score : boolean, optional

If False, the cv_results_ attribute will not include training scores.

My question is: what are the training scores?

In the following code I'm splitting data into ten stratified folds. As a consequence grid.cv_results_ contains ten test scores, namely 'split0_test_score', 'split1_test_score' , ..., 'split9_test_score'. I'm aware that each of those is the success rate obtained by a 5-nearest neighbors classifier that uses the corresponding fold for testing and the remaining nine folds for training.

grid.cv_results_ also contains ten train scores: 'split0_train_score', 'split1_train_score' , ..., 'split9_train_score'. How are these values calculated?

from sklearn import datasets
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import StratifiedKFold    

X, y = datasets.load_iris(True)

skf = StratifiedKFold(n_splits=10, random_state=0)
knn = KNeighborsClassifier()

grid = GridSearchCV(estimator=knn, 
                    cv=skf, 
                    param_grid={'n_neighbors': [5]}, 
                    return_train_score=True)
grid.fit(X, y)

print('Mean test score: {}'.format(grid.cv_results_['mean_test_score']))
print('Mean train score: {}'.format(grid.cv_results_['mean_train_score']))
#Mean test score: [ 0.96666667]
#Mean train score: [ 0.96888889]

Jan K · Accepted Answer

It is the train score of the prediction model on all folds excluding the one you are testing on. In your case, it is the score over the 9 folds you trained the model on.

How does GridSearchCV compute training scores?

Answers (2)

Related Questions