Scoring parameter in BayesSearchCV class confusion

Question

I'm using BayesSearchCV from scikit-optimize to train a model on a fairly imbalanced dataset. From what I'm reading precision or ROC AUC would be the best metrics for imbalanced dataset. In my code:

knn_b = BayesSearchCV(estimator=pipe, search_spaces=search_space, n_iter=40, random_state=7, scoring='roc_auc')
knn_b.fit(X_train, y_train)

The number of iterations is just a random value I chose (although I get a warning saying I already reached the best result, and there is not a way to early stop as far as I'm aware?). For the scoring parameter, I specified roc_auc, which I'm assuming it will be the primary metric to monitor for the best parameter in the results. So when I call knn_b.best_params_, I should have the parameters where the roc_auc metrics is higher. Is that correct?

My confusion is when I look at the results using knn_b.cv_results_. Shouldn't the mean_test_score be the roc_auc score because of the scoring param in the BayesSearchCV class? What I'm doing it plotting the results and seeing how each combination of params performed.

sns.relplot(
    data=knn_b.cv_results_, kind='line', x='param_classifier__n_neighbors', y='mean_test_score', 
    hue='param_scaler', col='param_classifier__p',
)

When I try to use to roc_auc_score() function on the true and predicted values, I get something completely different.

Is the mean_test_score here different? How would I be able to get the individual/mean roc_auc score of each CV/split of each iteration? Similarly for when I want to use RandomizedSearchCV or GridSearchCV.

EDIT: tldr; I want to know what's being computed exactly in mean_test_score. I thought it was roc_auc because of the scoring param, or accuracy, but it seems to be neither.

Scoring parameter in BayesSearchCV class confusion

Answers (1)

Related Questions