Francisco
Francisco

Reputation: 503

Sklearn -> Using Precision Recall AUC as a scoring metric in cross validation

I would like to use the AUC for the Precision and Recall curve as a metric to train my model. Do I need to make a specific scorer for this when using cross validation?

Consider the below reproducible example. Note the imbalanced target variable.

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, RepeatedStratifiedKFold

# generate 2 class dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42, weights=[.95])
# split into train/test sets
trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.2, random_state=2)

def evaluate_model(X, y, model):
    cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=42)
    scores = cross_val_score(model, X, y, scoring='roc_auc', cv=cv, n_jobs=-1)
    return scores

model = LogisticRegression(solver='liblinear')
scores = evaluate_model(X=trainX, y=trainy, model=model)
scores

I don't belive the roc_auc scorer is measuring the AUC for the the Precision Recall curve. How could one implement this scorer for cross validation?

Upvotes: 3

Views: 2738

Answers (1)

Ben Reiniger
Ben Reiniger

Reputation: 12738

"Average precision" is what you probably want, measuring a non-interpolated area under the PR curve. See the last few paragraphs of this example and this section of the User Guide.

For the scorer, use "average_precision"; the metric function is average_precision_score.

Upvotes: 5

Related Questions