Reputation: 2437
I intend to calculate accuracy/precision/recall/F1 measures for sentence classification task. I previously have computed it for whole text classification which is quite easy, but got confused at doing it for sentence classification as we perform at sentence-level and not text-/sentence(s)-level. Note that a text might contain several sentences... Here is an example:
Suppose we have the following text, with predicted labels in []:
Seq2seq networks are a good way of learning sequences. [0] They perform reasonably fine at generating long sequences. [1] These networks are utilized in downstream tasks such as NMT and text summarization [0]. blah blah blah [2]
So the prediction is [0, 1, 0, 2] and suppose the gold labels for the sentences above are: [1, 1, 0, 0].
So is the accuracy of this equal to correct / total = (1 + 1) / 4 = 0.5
? What about other metrics such as Precision, Recall, and F1? Any ideas?
Upvotes: 0
Views: 1267
Reputation: 2437
While I was eagerly looking for a solution for this, I got some inspirations from a relevant task (i.e., NER) and the definition of Precision and Recall, after the computation of which, F1 score can be easily calculated.
By definition:
I noticed that all I need is computing TP, FP, and FN. For example, for prediction case [0, 0, 1, 1]
whose true labels are: [0, 0, 1, 0]
, TP is 1, FP is 0, and FN is 1. Thus:
Here, since the model performance on the positive class is more important to me, I just try these metrics on the positive class. I also realized that this is the basic usage of F1 metric, but the level of granularity differs with each task. Hope this help anyone who's been puzzled about this issue.
Upvotes: 1
Reputation: 400
The questioner is seeking suggestions on approach to measuring model performance as opposed to programmatic solution using a particular language/library. Hence, following are some questions to think about and suggested approach.
Before attempting to answer the question, let us ask ourselves the following questions. They will help us understand the best approach forward.
As a final note, the question of whether precision, recall or accuracy is the best measurement depends on the trade off one wishes to make and the author would not comment on that.
Upvotes: -1
Reputation: 177
In case of multi-class classification, you can get the Precision, Recall and F1 score using metrics.classification_report()
. You can get there metrics for each individual class as well as their 'macro', 'micro', 'weighted' and 'samples' average
as well.
from sklearn import metrics
# True values
y_true = [1,1,0,0]
# Predicted values
y_pred = [0,1,0,2]
# Print the confusion matrix
print(metrics.confusion_matrix(y_true, y_pred))
# Print the precision and recall, among other metrics
print(metrics.classification_report(y_true, y_pred))
Upvotes: 1