Reputation: 135
I'm using BayesSearchCV from scikit-optimize
to train a model on a fairly imbalanced dataset. From what I'm reading precision or ROC AUC would be the best metrics for imbalanced dataset. In my code:
knn_b = BayesSearchCV(estimator=pipe, search_spaces=search_space, n_iter=40, random_state=7, scoring='roc_auc')
knn_b.fit(X_train, y_train)
The number of iterations is just a random value I chose (although I get a warning saying I already reached the best result, and there is not a way to early stop as far as I'm aware?). For the scoring parameter, I specified roc_auc
, which I'm assuming it will be the primary metric to monitor for the best parameter in the results. So when I call knn_b.best_params_
, I should have the parameters where the roc_auc metrics is higher. Is that correct?
My confusion is when I look at the results using knn_b.cv_results_
. Shouldn't the mean_test_score
be the roc_auc
score because of the scoring param in the BayesSearchCV class? What I'm doing it plotting the results and seeing how each combination of params performed.
sns.relplot(
data=knn_b.cv_results_, kind='line', x='param_classifier__n_neighbors', y='mean_test_score',
hue='param_scaler', col='param_classifier__p',
)
When I try to use to roc_auc_score()
function on the true and predicted values, I get something completely different.
Is the mean_test_score
here different? How would I be able to get the individual/mean roc_auc score of each CV/split of each iteration? Similarly for when I want to use RandomizedSearchCV or GridSearchCV.
EDIT: tldr; I want to know what's being computed exactly in mean_test_score
. I thought it was roc_auc
because of the scoring param, or accuracy, but it seems to be neither.
Upvotes: 0
Views: 2104
Reputation: 12602
mean_test_score
is the AUROC, because of your scoring
parameter, yes.
Your main problem is that the ROC curve (and the area under it) require the probability predictions (or other continuous score), not the hard class predictions. Your manual calculation is thus incorrect.
You shouldn't expect exactly the same score anyway. Your second score is on the test set, and the first score is optimistically biased by the hyperparameter selection.
Upvotes: 1