Reputation: 23
I am trying to plot a ROC curve of an identifier used to determine positive incidences against background dataset. The identifier is a list of probability scores with some overlap between the two groups.
FG BG
0.02 0.10
0.03 0.25
0.02 0.12
0.04 0.16
0.05 0.45
0.12 0.31
0.13 0.20
(where FG = Positive and BG = Negative.)
I am plotting a ROC curve using PRROC in R to assess how well the identifier classifies the data into the correct group. Although there is a clear distinction between the classifier values produced between the positive and negative datasets, but my current ROC plot in R shows a low AUC value. My probability scores for the positive data are lower than the background so if I switch the classification around and have the background as the foreground points, I get a high scoring AUC curve and I am not 100% clear why this is the case, which plot is the best to use or whether there was an additional step I have missed before analysing my data.
roc <- roc.curve(scores.class0 = FG, scores.class1 = BG, curve = T)
ROC curve
Area under curve:
0.07143
roc2 <- roc.curve(scores.class0 = BG, scores.class1 = FG, curve = T)
ROC curve
Area under curve:
0.92857
Upvotes: 2
Views: 1137
Reputation: 7969
As you have indeed noticed, most ROC analysis tools assume that the scores in your positive class are higher than those of the negative class. More formally, an instance is classified as "positive" if X > T, where T is the decision threshold, and negative otherwise.
There is no fundamental reason for it to be so. It is perfectly valid to have a decision such as X < T, however most ROC software don't have that option.
Using your first option resulting in AUC = 0.07143 would imply that your classifier performs worse than random. This is not correct.
As you noticed, swapping the class labels yields the correct curve value. This is possible because ROC curves are insensitive to class distributions - and the classes can be reverted without a problem. However I wouldn't personally recommend that option. I can see two cases where this can be misleading:
An alternative and preferable approach would be to invert your scores for this analysis, so that the positive class effectively has higher scores:
roc <- roc.curve(scores.class0 = -FG, scores.class1 = -BG, curve = T)
Upvotes: 0