Why can't I get the same results as GridSearchCV?

Question

GridSearchCV only returns a score for each parametrization and I would like to see an Roc Curve as well to better understand the results. In order to do this, I would like to take the best performing model from GridSearchCV and reproduce these same results but cache the probabilities. Here is my code

import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import SelectFromModel
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import StratifiedKFold
from sklearn.pipeline import Pipeline
from tqdm import tqdm

import warnings
warnings.simplefilter("ignore")

data = make_classification(n_samples=100, n_features=20, n_classes=2, 
                           random_state=1, class_sep=0.1)
X, y = data


small_pipe = Pipeline([
    ('rfs', SelectFromModel(RandomForestClassifier(n_estimators=100))), 
    ('clf', LogisticRegression())
])

params = {
    'clf__class_weight': ['balanced'],
    'clf__penalty'     : ['l1', 'l2'],
    'clf__C'           : [0.1, 0.5, 1.0],
    'rfs__max_features': [3, 5, 10]
}
key_feats = ['mean_train_score', 'mean_test_score', 'param_clf__C', 
             'param_clf__penalty', 'param_rfs__max_features']

skf = StratifiedKFold(n_splits=5, random_state=0)

all_results = list()
for _ in tqdm(range(25)):
    gs = GridSearchCV(small_pipe, param_grid=params, scoring='roc_auc', cv=skf, n_jobs=-1);
    gs.fit(X, y);
    results = pd.DataFrame(gs.cv_results_)[key_feats]
    all_results.append(results)


param_group = ['param_clf__C', 'param_clf__penalty', 'param_rfs__max_features']
all_results_df = pd.concat(all_results)
all_results_df.groupby(param_group).agg(['mean', 'std']
                    ).sort_values(('mean_test_score', 'mean'), ascending=False).head(20)

Here is my attempt at reproducing the results

small_pipe_w_params = Pipeline([
    ('rfs', SelectFromModel(RandomForestClassifier(n_estimators=100), max_features=3)), 
    ('clf', LogisticRegression(class_weight='balanced', penalty='l2', C=0.1))
])
skf = StratifiedKFold(n_splits=5, random_state=0)
all_scores = list()
for _ in range(25):
    scores = list()
    for train, test in skf.split(X, y):
        small_pipe_w_params.fit(X[train, :], y[train])
        probas = small_pipe_w_params.predict_proba(X[test, :])[:, 1]
        # cache probas here to build an Roc w/ conf interval later
        scores.append(roc_auc_score(y[test], probas))
    all_scores.extend(scores)

print('mean: {:<1.3f}, std: {:<1.3f}'.format(np.mean(all_scores), np.std(all_scores)))

I'm running the above multiple times as the results seem unstable. I have created a challenging dataset as my own dataset is equally as hard to learn. The groupby is meant to take all iterations of GridSearchCV and average & std the train and test scores to stabilize results. I then pick out the best performing model (C=0.1, penalty=l2 and max_features=3 in my most recent model) and try to reproduce these same results when I put those params in deliberately.

The GridSearchCV model yields a 0.63 mean and 0.042 std roc score whereas my own implementation gets 0.59 mean and std 0.131 roc. The grid search scores are considerably better. If I run this experiment out for 100 iterations for both GSCV and my own, the results are similar.

Why are these results not the same? They both internally use StratifiedKFold() when an integer for cv is supplied... and maybe GridSearchCV weights the scores by size of fold? I'm not sure on that, it would make sense though. Is my implementation flawed?

edit: random_state added to SKFold

Why can't I get the same results as GridSearchCV?

Answers (1)

Related Questions

Why can&#39;t I get the same results as GridSearchCV?

Answers (1)

Related Questions

Why can't I get the same results as GridSearchCV?