FredNunes
FredNunes

Reputation: 35

How is the best_score_ attribute of RandomizedSearchCV calculated?

so I just ran into an issue when trying to validate the best_score_ value for my grid search.

I just ran a RandomizedSearchCV, and got best_score_=0.05325203252032521.Then I tried to calculate this value manually, based on the information contained inside the RandomizedSearchCV object. What I did was:

print(best_model_idx)

results_in_splits = []

for k, v in cv.cv_results_.items():
    if 'split' in k:
        print('\t->', k)
        results_in_splits.append(v[best_model_idx])
    else:
        print(k)

print('\n')
print(sum(results_in_splits) / len(results_in_splits))
print(cv.best_score_)

This yielded the following output:

0
mean_fit_time
std_fit_time
mean_score_time
std_score_time
param_subsample
param_n_estimators
param_min_child_weight
param_max_depth
param_gamma
param_colsample_bytree
params
    -> split0_test_score
    -> split1_test_score
    -> split2_test_score
    -> split3_test_score
    -> split4_test_score
    -> split5_test_score
    -> split6_test_score
    -> split7_test_score
    -> split8_test_score
    -> split9_test_score
    -> split10_test_score
    -> split11_test_score
    -> split12_test_score
mean_test_score
std_test_score
rank_test_score

As you can see we obtain a different result (0.046 vs 0.053) and in some other experiments, this change is even more drastic.

Can anyone help me clear this up? It would be greatly appreciated!

Thanks.

Upvotes: 0

Views: 3118

Answers (2)

desertnaut
desertnaut

Reputation: 60321

UPDATE: As per OP's comment below, upgrading scikit-learn from v0.21.3 to v0.22.2 resolved the issue.


As I already mentioned in the comments, I have been unable to reproduce your issue, either with the iris data or with dummy data from several configurations of scikit-learn's make_classification.

Running the whole script (code + data) you have posted at Pastebin does not change this; here are the last lines of your own code:

results_in_splits = []

for k, v in cv.cv_results_.items():
    if 'split' in k:
        print('\t->', k)
        results_in_splits.append(v[best_model_idx])
    else:
        print(k)

print('\n')
print(sum(results_in_splits) / len(results_in_splits))
print(cv.best_score_)

the output being

mean_fit_time
std_fit_time
mean_score_time
std_score_time
param_subsample
param_n_estimators
param_min_child_weight
param_max_depth
param_gamma
param_colsample_bytree
params
    -> split0_test_score
    -> split1_test_score
    -> split2_test_score
    -> split3_test_score
    -> split4_test_score
    -> split5_test_score
    -> split6_test_score
    -> split7_test_score
    -> split8_test_score
    -> split9_test_score
    -> split10_test_score
    -> split11_test_score
    -> split12_test_score
mean_test_score
std_test_score
rank_test_score


0.8926320979964705
0.8926320979964705

i.e. apparently the two scores are identical, as they should be.

The almost identical scores in your CV splits, discussed in the comments of the other answer here, are also not a bug or anything, just an artifact of an unfortunate situation: too small a dataset (678 samples) combined with too many CV splits (13) leads to your validation samples being only 13-14 for each split; any statistic calculated on so few samples is spurious and should not be relied upon.

But this last observation is actually irrelevant to your main question here: what you report is not reproducible under a variety of situations, including the script and data provided by yourself.

Upvotes: 0

Batuhan B
Batuhan B

Reputation: 1855

RandomizedSearchCV tries to find best parameters for your model. To do this, for different parameter combination it trains your model again and again with cross validation and calculate the mean score of cross-validation for each parameter setup.

Then it checks the highest mean cross-validated score and return the best parameters and best score of your model.

In a summary:

  • It tries N different parameter combination on your dataset.
  • For each parameter combination, it trains model with cross-validation.
  • Then it takes the average of each fold in cross-validation and then assign that score to responsible parameter combination.
  • Then looks all results and select the highest one.
  • Finally it returns, the best score, best model, best parameters etc..

Upvotes: 2

Related Questions