How to use precision recall curve in gridsearchcv?

Question

I am trying to use the sklearn gridsearchcv for hyperprameter tuning. I hope to use the metric 'area under precision_recall_curve'.

the gridsearchcv is something like

>>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
>>> svc = svm.SVC()
>>> clf = GridSearchCV(svc, parameters, scoring='accuracy')
>>> clf.fit(iris.data, iris.target)

So basically what I want is changing the string 'accuracy' in to the area under precision_recall_curve. How should I customize it?

afsharov · Accepted Answer

The area under the precision-recall curve can be estimated by the average_precision_score. From its documentation:

AP [Average Precision] summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight.

Effectively, this is an approximation of the area under the precision-recall-curve and is implemented in scikit-learn. There is a great blog available here that summarizes the concept behind it and also links to the Wikipedia article where it is stated that:

[Average precision] is the area under the precision-recall curve.

The average_precision_score can be used by specifying average_precision as the scoring method:

clf = GridSearchCV(svc, parameters, scoring='average_precision')

However, keep this important note about average_precision_score in mind:

This implementation is not interpolated and is different from computing the area under the precision-recall curve with the trapezoidal rule, which uses linear interpolation and can be too optimistic.

How to use precision recall curve in gridsearchcv?

Answers (1)

Related Questions