Reputation: 9
I have class X with 1000 observations and class Y with 2000 observations. I am trying to decide which classification evaluation metric is most appropriate here and why.
I am tempted to stick with option 4 since it is not an imbalanced set IMO and we don't need to use precision recall curve. Please kindly elaborate on what is appropriate here.
Upvotes: 0
Views: 1316
Reputation: 445
My very concise answer would be : all of them are useful because they all give you a different insight about your classifier. Now let me elaborate a bit more.
First about the classification metrics you mention. The precision recall curve is usually called the ROC curve, and the AUC ROC is simply the area under this curve. Thus, you two first points are strongly related and we actually need both to have qualitative and quantitative insights about performance. What is particularly interesting with ROC curve is that it gives you the couple (precision, recall) for each and every classification threshold, which basically is a super synthetic way to store meaningful information.
Then, you are mentioning accuracy, which is actually the most basic classification metric. It is of course interesting to display it, but you should mind the fact that accuracy is sometimes biased : imagine a situation when you are doing binary classification and one class occurs only in 10% of the cases. Then, a model that always predicts the other class is 90% accurate! That's why accuracy is useful when compared to other metrics, typically precision, recall and F1 score. I elaborate a quick overview of these three:
Finally, let's elaborate about the last point you mentioned: confusion matrix and classification report. According to what I wrote above, elaborating about classification report would be completely redundant because these previous metrics actually correspond to what you can find in the classification report. Then, the only thing I can point about that is that you probably need to display or even log a classification report each time you run your model.
Now about confusion matrix, this will simply give you the true/false negative/positive in each case, which is the basic material from which the above mentioned metrics are computed. Thus, it is interesting to have a look as well to detect anomalies, but it is not enough since we can make many relevant computations from it.
The Mathew correlation coefficient (MCC) computes the correlation between true and predicted true entries. This would be particularly relevant in the case you have class inbalance, which seem not to be your case but I still mention it as a further word to end this (probably too long) elaboration.
Upvotes: 1