Difference between gridCV.score method on training data vs gridCV.best_score_

Question

I have a question about the difference between a randomsearch.score method (score 1) and randomsearch.best_score_ attribute (score 2).

Particularly when randomsearch.score is applied to X_train and y_train.

I thought the randomsearchCV automatically looks for the params that give the highest score on the training set? I would have assumed the randomsearch.score(Xtrain, ytrain) would be the same as the randomsearch.best_params_ score?

from sklearn.model_selection import RandomizedSearchCV

def evaluate_model(model, param_grid, n_iter=100):
    random_search = RandomizedSearchCV(model, 
                                       param_grid, 
                                       cv=5, 
                                       n_jobs=2, 
                                       verbose=1, 
                                       n_iter=n_iter)

    random_search.fit(X_train, y_train)

    print (random_search.score(X_train, y_train)) # Score 1
    print (random_search.best_score_) # Score 2
    print (random_search.score(X_test, y_test)) # Score 3

    return random_search

  rgr = GradientBoostingRegressor(n_estimators=50)
  param_grid = {"max_depth": range(1,10,1)}

  gradient_boosting = evaluate_model(rgr, param_grid)

instead returns

Score 1: 0.9585014239352219
Score 2: 0.7129331788310186
Score 3: 0.7530744077231204

Shihab Shahriar Khan · Accepted Answer

With random_search.score(X_train, y_train), you are testing on same data you've used for training, hence such a high score. This is a (almost) completely meaningless information**, as it doesn't tell you how well your model will perform in unseen data.

cv=5 means your data was partitioned 5 times for each hyper-parameter setting, with 20% data used for testing, and 80% used for training in each partition. Result of these 5 test set is then averaged. The highest such average among all possible hyper-parameter setting is then recorded in random_search.best_score_. So the crucial difference is you aren't evaluating performance on same data used for training, hence comparably lower score.

random_search.score(X_test, y_test) is same as best_score_ in that you are evaluating model on unseen data, but it is better indicator of actual generalization performance. Unlike score 2 however, your model has been trained with 100% training data (as opposed to 80%). This is one possible explanation as to why score 3 is better than score 2.

**If this value is low, you know your model is underfitting, and should try increasing model complexity, like adding more hidden layers to NN, or increasing max_depth of decision tree.

Difference between gridCV.score method on training data vs gridCV.best_score_

Answers (1)

Related Questions