Tonechas
Tonechas

Reputation: 13733

How does GridSearchCV compute training scores?

I'm having a hard time figuring out parameter return_train_score in GridSearchCV. From the docs:

return_train_score : boolean, optional

       If False, the cv_results_ attribute will not include training scores.

My question is: what are the training scores?

In the following code I'm splitting data into ten stratified folds. As a consequence grid.cv_results_ contains ten test scores, namely 'split0_test_score', 'split1_test_score' , ..., 'split9_test_score'. I'm aware that each of those is the success rate obtained by a 5-nearest neighbors classifier that uses the corresponding fold for testing and the remaining nine folds for training.

grid.cv_results_ also contains ten train scores: 'split0_train_score', 'split1_train_score' , ..., 'split9_train_score'. How are these values calculated?

from sklearn import datasets
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import StratifiedKFold    

X, y = datasets.load_iris(True)

skf = StratifiedKFold(n_splits=10, random_state=0)
knn = KNeighborsClassifier()

grid = GridSearchCV(estimator=knn, 
                    cv=skf, 
                    param_grid={'n_neighbors': [5]}, 
                    return_train_score=True)
grid.fit(X, y)

print('Mean test score: {}'.format(grid.cv_results_['mean_test_score']))
print('Mean train score: {}'.format(grid.cv_results_['mean_train_score']))
#Mean test score: [ 0.96666667]
#Mean train score: [ 0.96888889]

Upvotes: 6

Views: 11540

Answers (2)

Vivek Kumar
Vivek Kumar

Reputation: 36599

Maybe my other answer here will give you clear understanding of working in grid-search.

Essentially training scores are the score of model on the same data on which its trained on.

In each fold split, data will be divided into two parts: train and test. Train data will be used to fit() the internal estimator and test data will be used to check the performance of that. training score is just to check how well the model fit the training data.

Upvotes: 2

Jan K
Jan K

Reputation: 4150

It is the train score of the prediction model on all folds excluding the one you are testing on. In your case, it is the score over the 9 folds you trained the model on.

Upvotes: 4

Related Questions