Reputation: 183
I have a dataset to which I add 10-30% of artificial data and run an algorithm to classify what data is original and what artificial. I got the attached ROC curves. I've never seen ROC curves ending like that. Am I doing something wrong? Or such pattern is possible? If so, what would be its explanation?
Thanks
Upvotes: 4
Views: 726
Reputation: 19169
You could see a ROC curve similar to what you have shown if your target data have an unbalanced bimodal distribution with a noise/background distribution located between the two modes. Initially (like in your plot), you would have a steep increase in the ROC curve as it covers the main peak of the true positive (TP) distribution. Next, you would have a relatively flat region where you accumulate false positives (FP's) without much increase in TP's. Then, you would hit the second cluster of TP's.
I'm guessing that your artificial data is closer to the centroid of the main cluster of TP's, which is why adding more artificial data tends to deemphasize the smaller TP cluster and make it look more like a typical ROC curve.
As I mentioned in my comment, it would be informative to plot the ROC curve without any artificial data. Also, it could be informative to show a version zoomed in on the tail end of the plot where the TP rate approaches 1 (i.e., to see if it flattens as it approaches 1).
Upvotes: 2