renakre
renakre

Reputation: 8291

Why Cross Validation performs poorer than testing?

In the following code I fit a LogisticRegressionCV model with X_test (features) and y_test (label) data.

Then, using the model I apply cross_val_predict with 10 fold to assess the performance using CV. I calculate two different AUC scores, one with roc_auc_score method on predictions, and another with auc method on probabilities.

#CV LOGISTIC REGRESSION
classifier = linear_model.LogisticRegressionCV(penalty='l1',class_weight='balanced', tol=0.01, Cs=[0.1],
                                               max_iter=4000, solver='liblinear', random_state = 42, cv=10) 
classifier.fit(X_test, y_test);

predicted = sklearn.model_selection.cross_val_predict(classifier, X_test, y_test, cv=10)     
print ("AUC1:{}".format(sklearn.metrics.roc_auc_score(y_test, predicted)))#np.average(scores)))

probas_ = sklearn.model_selection.cross_val_predict(classifier, X_test, y_test, cv=10, method='predict_proba')
fpr, tpr, thresholds = sklearn.metrics.roc_curve(y_test, probas_[:, 1])
roc_auc = sklearn.metrics.auc(fpr, tpr)
print ("AUC2  :{}".format(roc_auc))

The AUC scores are 0.624 and 0.654 correspondingly.

Then, I build another LogisticRegression model this time using GridSearchCV. The model is trained on the same training data (used in CV) but this time it predicts the test data:

## GRIDSEARCHCV LOGISTIC REGRESSION   
param_grid={'C': np.logspace(-2, 2, 40)}

# Create grid search object
clf = sklearn.model_selection.GridSearchCV(linear_model.LogisticRegression(penalty='l1', 
                                                                           class_weight='balanced',
                                                                           solver = 'liblinear',
                                                                           max_iter=4000, 
                                                                           random_state = 42), 
                                           param_grid = param_grid, 
                                           cv = 5, 
                                           scoring = 'roc_auc', 
                                           verbose=True,
                                           n_jobs=-1)    
best_clf = clf.fit(X_train, y_train)  
predicted = best_clf.predict(X_test)     
print ("AUC1:{}".format(best_clf.best_score_))

probas_ = best_clf.predict_proba(X_test)   
fpr, tpr, thresholds = sklearn.metrics.roc_curve(y_test, probas_[:, 1])
roc_auc = sklearn.metrics.auc(fpr, tpr)
print ("AUC2  :{}".format(roc_auc))

This time the AUC scores are 0.603 and 0.688 correspondingly.

That is, one outperforms the other depending on the AUC score used. This post recommends the second AUC score I reported here. But, then I do now how CV performs poorer although it is trained and tested with the same data.

Any ideas? Do you think this is normal (if so why)? Also, I wonder if my code looks fine. I appreciate your suggestions.

Upvotes: 1

Views: 66

Answers (1)

Onlyfood
Onlyfood

Reputation: 137

Well, I think you need to use training data for the CV rather than test data. Your first model(LGR classifier) is fitted on X_test,y_test, so is your cross-validation model('predicited).

Since the test data set usually has fewer instances or rows of data than the training data set, it may just be that the model is underfitting due to the smaller size of data.

Try to do them all on the training set, the test set is usually only for the prediction, fitting test set denies it's meaning of checking the model's performance on unfitted (unseen) data.

Good luck~

Upvotes: 2

Related Questions