Reputation: 503
I would like to use the AUC for the Precision and Recall curve as a metric to train my model. Do I need to make a specific scorer for this when using cross validation?
Consider the below reproducible example. Note the imbalanced target variable.
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, RepeatedStratifiedKFold
# generate 2 class dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42, weights=[.95])
# split into train/test sets
trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.2, random_state=2)
def evaluate_model(X, y, model):
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=42)
scores = cross_val_score(model, X, y, scoring='roc_auc', cv=cv, n_jobs=-1)
return scores
model = LogisticRegression(solver='liblinear')
scores = evaluate_model(X=trainX, y=trainy, model=model)
scores
I don't belive the roc_auc
scorer is measuring the AUC for the the Precision Recall curve. How could one implement this scorer for cross validation?
Upvotes: 3
Views: 2738
Reputation: 12738
"Average precision" is what you probably want, measuring a non-interpolated area under the PR curve. See the last few paragraphs of this example and this section of the User Guide.
For the scorer, use "average_precision"
; the metric function is average_precision_score
.
Upvotes: 5