How is the best_score_ attribute of RandomizedSearchCV calculated?

Question

so I just ran into an issue when trying to validate the best_score_ value for my grid search.

I just ran a RandomizedSearchCV, and got best_score_=0.05325203252032521.Then I tried to calculate this value manually, based on the information contained inside the RandomizedSearchCV object. What I did was:

print(best_model_idx)

results_in_splits = []

for k, v in cv.cv_results_.items():
    if 'split' in k:
        print('	->', k)
        results_in_splits.append(v[best_model_idx])
    else:
        print(k)

print('
')
print(sum(results_in_splits) / len(results_in_splits))
print(cv.best_score_)

This yielded the following output:

0
mean_fit_time
std_fit_time
mean_score_time
std_score_time
param_subsample
param_n_estimators
param_min_child_weight
param_max_depth
param_gamma
param_colsample_bytree
params
    -> split0_test_score
    -> split1_test_score
    -> split2_test_score
    -> split3_test_score
    -> split4_test_score
    -> split5_test_score
    -> split6_test_score
    -> split7_test_score
    -> split8_test_score
    -> split9_test_score
    -> split10_test_score
    -> split11_test_score
    -> split12_test_score
mean_test_score
std_test_score
rank_test_score

As you can see we obtain a different result (0.046 vs 0.053) and in some other experiments, this change is even more drastic.

Can anyone help me clear this up? It would be greatly appreciated!

Thanks.

desertnaut · Accepted Answer

UPDATE: As per OP's comment below, upgrading scikit-learn from v0.21.3 to v0.22.2 resolved the issue.

As I already mentioned in the comments, I have been unable to reproduce your issue, either with the iris data or with dummy data from several configurations of scikit-learn's make_classification.

Running the whole script (code + data) you have posted at Pastebin does not change this; here are the last lines of your own code:

results_in_splits = []

for k, v in cv.cv_results_.items():
    if 'split' in k:
        print('	->', k)
        results_in_splits.append(v[best_model_idx])
    else:
        print(k)

print('
')
print(sum(results_in_splits) / len(results_in_splits))
print(cv.best_score_)

the output being

mean_fit_time
std_fit_time
mean_score_time
std_score_time
param_subsample
param_n_estimators
param_min_child_weight
param_max_depth
param_gamma
param_colsample_bytree
params
    -> split0_test_score
    -> split1_test_score
    -> split2_test_score
    -> split3_test_score
    -> split4_test_score
    -> split5_test_score
    -> split6_test_score
    -> split7_test_score
    -> split8_test_score
    -> split9_test_score
    -> split10_test_score
    -> split11_test_score
    -> split12_test_score
mean_test_score
std_test_score
rank_test_score


0.8926320979964705
0.8926320979964705

i.e. apparently the two scores are identical, as they should be.

The almost identical scores in your CV splits, discussed in the comments of the other answer here, are also not a bug or anything, just an artifact of an unfortunate situation: too small a dataset (678 samples) combined with too many CV splits (13) leads to your validation samples being only 13-14 for each split; any statistic calculated on so few samples is spurious and should not be relied upon.

But this last observation is actually irrelevant to your main question here: what you report is not reproducible under a variety of situations, including the script and data provided by yourself.

How is the best_score_ attribute of RandomizedSearchCV calculated?

Answers (2)

Related Questions