Xudong
Xudong

Reputation: 525

How to use precision recall curve in gridsearchcv?

I am trying to use the sklearn gridsearchcv for hyperprameter tuning. I hope to use the metric 'area under precision_recall_curve'.

the gridsearchcv is something like

>>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
>>> svc = svm.SVC()
>>> clf = GridSearchCV(svc, parameters, scoring='accuracy')
>>> clf.fit(iris.data, iris.target)

So basically what I want is changing the string 'accuracy' in to the area under precision_recall_curve. How should I customize it?

Upvotes: 3

Views: 3816

Answers (1)

afsharov
afsharov

Reputation: 5164

The area under the precision-recall curve can be estimated by the average_precision_score. From its documentation:

AP [Average Precision] summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight.

Effectively, this is an approximation of the area under the precision-recall-curve and is implemented in scikit-learn. There is a great blog available here that summarizes the concept behind it and also links to the Wikipedia article where it is stated that:

[Average precision] is the area under the precision-recall curve.

The average_precision_score can be used by specifying average_precision as the scoring method:

clf = GridSearchCV(svc, parameters, scoring='average_precision')

However, keep this important note about average_precision_score in mind:

This implementation is not interpolated and is different from computing the area under the precision-recall curve with the trapezoidal rule, which uses linear interpolation and can be too optimistic.

Upvotes: 4

Related Questions