Karthick
Karthick

Reputation: 4766

Unexpected behaviour for evaluation metrics (precision etc) in sklearn

from sklearn.metrics import precision_score
a = [ 1, 2, 1, 1, 2 ]
b = [ 1, 2, 2, 1, 1 ]

print precision_score(a,b, labels = [1])
# 0.6666
print precision_score(a,b, labels = [2])
# 0.5
print precision_score(a,b, labels = [1,2])
# 0.6666

Why are the values same for the first and last case?

Calculating by hand, total precision should be 3/5 = 0.6. But the third case outputs 0.6666, which happens to be the value of the first one.

Edit 1: Added the import path to the function in question.

Upvotes: 2

Views: 291

Answers (2)

Andy Rimmer
Andy Rimmer

Reputation: 2111

See here (http://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html#sklearn.metrics.precision_score) for documentation. I think you need to change the average argument to micro to get the overall precision across the specified labels i.e.:

print precision_score(a,b, labels = [1,2], average='micro')

The default value for average is weighted, which computes a weighted average of precision over the specified labels. If you use micro, according to the documentation, it computes the precision over all true and false positives (presumably all means all the specified labels, but the documentation is not clear on this). I think this is what you want? I have not been able to check this, as I don't know which version of scikit you're using.

Upvotes: 1

Fred Foo
Fred Foo

Reputation: 363487

You have to tell precision_score for which label it should compute the precision. What you're seeing is the precision for label 1:

>>> precision_score(a, b)
0.66666666666666663
>>> precision_score(a, b, pos_label=1)
0.66666666666666663

But you want the precision for label 2:

>>> precision_score(a, b, pos_label=2)
0.5

Upvotes: 1

Related Questions