Why Cross Validation performs poorer than testing?

Question

In the following code I fit a LogisticRegressionCV model with X_test (features) and y_test (label) data.

Then, using the model I apply cross_val_predict with 10 fold to assess the performance using CV. I calculate two different AUC scores, one with roc_auc_score method on predictions, and another with auc method on probabilities.

#CV LOGISTIC REGRESSION
classifier = linear_model.LogisticRegressionCV(penalty='l1',class_weight='balanced', tol=0.01, Cs=[0.1],
                                               max_iter=4000, solver='liblinear', random_state = 42, cv=10) 
classifier.fit(X_test, y_test);

predicted = sklearn.model_selection.cross_val_predict(classifier, X_test, y_test, cv=10)     
print ("AUC1:{}".format(sklearn.metrics.roc_auc_score(y_test, predicted)))#np.average(scores)))

probas_ = sklearn.model_selection.cross_val_predict(classifier, X_test, y_test, cv=10, method='predict_proba')
fpr, tpr, thresholds = sklearn.metrics.roc_curve(y_test, probas_[:, 1])
roc_auc = sklearn.metrics.auc(fpr, tpr)
print ("AUC2  :{}".format(roc_auc))

The AUC scores are 0.624 and 0.654 correspondingly.

Then, I build another LogisticRegression model this time using GridSearchCV. The model is trained on the same training data (used in CV) but this time it predicts the test data:

## GRIDSEARCHCV LOGISTIC REGRESSION   
param_grid={'C': np.logspace(-2, 2, 40)}

# Create grid search object
clf = sklearn.model_selection.GridSearchCV(linear_model.LogisticRegression(penalty='l1', 
                                                                           class_weight='balanced',
                                                                           solver = 'liblinear',
                                                                           max_iter=4000, 
                                                                           random_state = 42), 
                                           param_grid = param_grid, 
                                           cv = 5, 
                                           scoring = 'roc_auc', 
                                           verbose=True,
                                           n_jobs=-1)    
best_clf = clf.fit(X_train, y_train)  
predicted = best_clf.predict(X_test)     
print ("AUC1:{}".format(best_clf.best_score_))

probas_ = best_clf.predict_proba(X_test)   
fpr, tpr, thresholds = sklearn.metrics.roc_curve(y_test, probas_[:, 1])
roc_auc = sklearn.metrics.auc(fpr, tpr)
print ("AUC2  :{}".format(roc_auc))

This time the AUC scores are 0.603 and 0.688 correspondingly.

That is, one outperforms the other depending on the AUC score used. This post recommends the second AUC score I reported here. But, then I do now how CV performs poorer although it is trained and tested with the same data.

Any ideas? Do you think this is normal (if so why)? Also, I wonder if my code looks fine. I appreciate your suggestions.

Why Cross Validation performs poorer than testing?

Answers (1)

Related Questions