Reputation: 525
I am trying to use the sklearn gridsearchcv for hyperprameter tuning. I hope to use the metric 'area under precision_recall_curve'.
the gridsearchcv is something like
>>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
>>> svc = svm.SVC()
>>> clf = GridSearchCV(svc, parameters, scoring='accuracy')
>>> clf.fit(iris.data, iris.target)
So basically what I want is changing the string 'accuracy' in to the area under precision_recall_curve. How should I customize it?
Upvotes: 3
Views: 3816
Reputation: 5164
The area under the precision-recall curve can be estimated by the average_precision_score
. From its documentation:
AP [Average Precision] summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight.
Effectively, this is an approximation of the area under the precision-recall-curve and is implemented in scikit-learn
. There is a great blog available here that summarizes the concept behind it and also links to the Wikipedia article where it is stated that:
[Average precision] is the area under the precision-recall curve.
The average_precision_score
can be used by specifying average_precision
as the scoring method:
clf = GridSearchCV(svc, parameters, scoring='average_precision')
However, keep this important note about average_precision_score
in mind:
This implementation is not interpolated and is different from computing the area under the precision-recall curve with the trapezoidal rule, which uses linear interpolation and can be too optimistic.
Upvotes: 4