mlengg
mlengg

Reputation: 113

scikit learn Grid Cross Validation returning incorrect mean

I used the GridCV to do cross validation across k folds to tune my hyper parameters. The mean results which should have been mean over individual folds is wrong in my results attribute "cv_results_". Following is my code for same:

gscv = GridSearchCV(n_jobs=n_jobs,cv=train_test_iterable, estimator=pipeline, param_grid=param_grid, 
                verbose=10, scoring=['accuracy', 'precision','recall','f1'], refit='f1', 
                    return_train_score=return_train_score, error_score=error_score,
                   )
gscv.fit(X,Y)
gscv.cv_results_

The cv_results_ contains following json(displayed as table)

    mean_test_f1    split0_test_f1  split1_test_f1  Actual Mean
    0.934310796     0.935603198     0.933665455     0.934634326
    0.931279716     0.908430118     0.942689316     0.925559717
    0.927683609     0.912005672     0.935512149     0.923758911
    0.680908006     0.741198823     0.650802701     0.696000762
    0.680908006     0.741198823     0.650802701     0.696000762
    0.646005028     0.684483208     0.626791532     0.65563737
    0.840273248     0.847484083     0.836672627     0.842078355
    0.837160828     0.847484083     0.832006068     0.839745075
    0.833637        0.842109375     0.829406448     0.835757911

You can see above: the "mean_test_f1" is not the mean of two folds "split0_test_f1", "split1_test_f1". Actual mean is the last column.

Note: F1 means the f1-score.

Did anyone face similar issues?

Upvotes: 2

Views: 266

Answers (2)

Vivek Kumar
Vivek Kumar

Reputation: 36619

Try setting iid=False in GridSearchCV(...) and compare.

According to documentation:

iid : boolean, default=True

    If True, the data is assumed to be identically distributed across 
    the folds, and the loss minimized is the total loss per sample,
    and not the mean loss across the folds.

So when iid is True (by default), averaging of test scores include a weight as specified here in source code:

    _store('test_%s' % scorer_name, test_scores[scorer_name],
                   splits=True, rank=True,
                   weights=test_sample_counts if iid else None)

Please note that train scores are not affected by it, so also cross-check the mean of train scores.

Upvotes: 1

tangy
tangy

Reputation: 3276

I think what your seeing is a weighed mean not a direct average.

Upvotes: 1

Related Questions