What is the difference between grid.score(X_valid, y_valid) and grid.best_score_

Question

While doing GridSearchCV, what is the difference between the scores obtained through grid.score(...) and grid.best_score_

Kindly assume that a model, features, target, and param_grid are in place. Here is a part of the code I am very curious to know about.

grid = GridSearchCV(X_train, y_train)
grid.fit(X_train, y_train)
scores = grid.score(estimator=my_model, param_grid=params, cv=3,
                    return_train_score=True, scoring='neg_mean_squared_error')
best_score_1 = scores
best_score_2 = grid.best_score_

There are two different outputs for each of best_score_1 and best_score_2 I am trying to know the difference between the two as well as which of the following should be considered to be the best scores that came out from the given param_grid.

Following is the full function.

def apply_grid (df, model, features, target, params, test=False):
    '''
    Performs GridSearchCV after re-splitting the dataset, provides
    comparison between train's MSE and test's MSE to check for 
    Generalization and optionally deploys the best-found parameters 
    on the Test Set as well.
    
    Args:
        df: DataFrame
        model: a model to use
        features: features to consider
        target: labels
        params: Param_Grid for Optimization
        test: False by Default, if True, predicts on Test
        
    Returns:
        MSE scores on models and slice from the cv_results_
        to compare the models generalization performance
    '''
    
    my_model = model()
    # Split the dataset into train and test
    X_train, X_test, y_train, y_test = train_test_split(df[features], 
                                                        df[target], random_state=0)
    
    # Resplit the train dataset for GridSearchCV into train2 and valid to keep the test set separate
    X_train2, X_valid, y_train2, y_valid = train_test_split(train[features],
                                                          train[target] , random_state=0)
    
    # Use Grid Search to find the best parameters from the param_grid
    grid = GridSearchCV(estimator=my_model, param_grid=params, cv=3,
                        return_train_score=True, scoring='neg_mean_squared_error')
    grid.fit(X_train2, y_train2)
    
    # Evaluate on Valid set
    scores = grid.score(X_valid, y_valid)
    scores = scores # CONFUSION
    print('Best MSE through GridSearchCV: ', grid.best_score_) # CONFUSION
    print('Best MSE through GridSearchCV: ', scores)
    print('I AM CONFUSED ABOUT THESE TWO OUTPUTS ABOVE. WHY ARE THEY DIFFERENT')
    print('Best Parameters: ',grid.best_params_)
    print('-'*120)
    print('mean_test_score is rather mean_valid_score')
    report = pd.DataFrame(grid.cv_results_)
    
    # If test is True, deploy the best_params_ on the test set
    if test == True:
        my_model = model(**grid.best_params_)
        my_model.fit(X_train, y_train)

        predictions = my_model.predict(X_test)
        mse = mean_squared_error(y_test, predictions)
        print('TEST MSE with the best params: ', mse)
        print('-'*120)
    
    return report[['mean_train_score', 'mean_test_score']]

simone campisi · Accepted Answer

UPDATED

As explained in the sklearn documentation, GridSearchCV takes all the parameter lists of parameters you pass and tries all possible combinations to find the best parameters.

To evaluate which are the best parameters, it calculates a k-fold cross-validation for each parameters combination. With k-fold cross-validation, the training set is divided into Training set and Validation set (which is a test set). If you choose, for example, cv=5 the dataset is divided into 5 non-overlapping folds, and each fold is used as a validation set, while all the other are used as training set. Hence, GridSearchCV, in the example, calculates the average validation score (which can be accuracy or something else) for each of the 5 folds, and does so for each parameters combination. Then, at the end of GridsearchCV there will be an average validation score for each parameter combination, and the one with the highest average validation score is returned. So, the average validation score, associated to the best parameters, is stored in the grid.best_score_ variable.

On the other hand, the grid.score(X_valid, y_valid) method gives the score on the given data, if the estimator has been refitted (refit=True).This means that it is not the average accuracy of the 5 folds, but is taken the model with the best parameters and is trained using the training set. Then, are computed the predictions on the X_valid and compared compared with the y_valid in order to get the score.

What is the difference between grid.score(X_valid, y_valid) and grid.best_score_

Answers (1)

Related Questions