Reputation: 1725
I'm wondering how to calculate precision and recall measures for multiclass multilabel classification, i.e. classification where there are more than two labels, and where each instance can have multiple labels?
Upvotes: 32
Views: 33666
Reputation: 9859
In python using sklearn
and numpy
:
from sklearn.metrics import confusion_matrix
import numpy as np
labels = ...
predictions = ...
cm = confusion_matrix(labels, predictions)
recall = np.diag(cm) / np.sum(cm, axis = 1)
precision = np.diag(cm) / np.sum(cm, axis = 0)
Upvotes: 4
Reputation: 61910
For multi-label classification you have two ways to go First consider the following.
The metrics are computed in a per datapoint manner. For each predicted label its only its score is computed, and then these scores are aggregated over all the datapoints.
There are other metrics as well.
Here the things are done labels-wise. For each label the metrics (eg. precision, recall) are computed and then these label-wise metrics are aggregated. Hence, in this case you end up computing the precision/recall for each label over the entire dataset, as you do for a binary classification (as each label has a binary assignment), then aggregate it.
The easy way is to present the general form.
This is just an extension of the standard multi-class equivalent.
Here the are the true positive, false positive, true negative and false negative counts respectively for only the
label.
Here $B$ stands for any of the confusion-matrix based metric. In your case you would plug in the standard precision and recall formulas. For macro average you pass in the per label count and then sum, for micro average you average the counts first, then apply your metric function.
You might be interested to have a look into the code for the mult-label metrics here , which a part of the package mldr in R. Also you might be interested to look into the Java multi-label library MULAN.
This is a nice paper to get into the different metrics: A Review on Multi-Label Learning Algorithms
Upvotes: 24
Reputation: 360
Simple averaging will do if the classes are balanced.
Otherwise, recall for each real class needs to be weighted by prevalence of the class, and precision for each predicted label needs to be weighted by the bias (probability) for each label. Either way you get Rand Accuracy.
A more direct way is to make a normalized contingency table (divide by N so table adds up to 1 for each combination of label and class) and add the diagonal to get Rand Accuracy.
But if classes aren't balanced, the bias remains and a chance corrected method such as kappa is more appropriate, or better still ROC analysis or a chance correct measure such as informedness (height above the chance line in ROC).
Upvotes: 1
Reputation: 196
Now, to compute recall for label A you can read off the values from the confusion matrix and compute:
= TP_A/(TP_A+FN_A)
= TP_A/(Total gold labels for A)
Now, let us compute precision for label A, you can read off the values from the confusion matrix and compute:
= TP_A/(TP_A+FP_A)
= TP_A/(Total predicted as A)
You just need to do the same for the remaining labels B and C. This applies to any multi-class classification problem.
Here is the full article that talks about how to compute precision and recall for any multi-class classification problem, including examples.
Upvotes: 6
Reputation: 26881
The answer is that you have to compute precision and recall for each class, then average them together. E.g. if you classes A, B, and C, then your precision is:
(precision(A) + precision(B) + precision(C)) / 3
Same for recall.
I'm no expert, but this is what I have determined based on the following sources:
https://list.scms.waikato.ac.nz/pipermail/wekalist/2011-March/051575.html http://stats.stackexchange.com/questions/21551/how-to-compute-precision-recall-for-multiclass-multilabel-classification
Upvotes: 8