Reputation: 53
I am having weird results of ROC curve when building a random forest model for highly imbalanced two-class prediction. Original event rate in the sample is ~2% and I am using weighting to fight class imbalance.
In this case I have weighted my sample so that event rate is 1:4 (25%)
My model is set-up in a following way:
forest <- ranger(data = sample[,c('fraud', features)]
, num.trees = 350
, case.weights = sample$wt
, probability = T
, importance = 'impurity'
, write.forest = T
, sample.fraction = 0.5
, seed = 98
, dependent.variable.name = 'fraud')
I am getting pretty good results with this set-up as you can see in the confusion matrix below
predicted
true 0 1
0 815800 11391
1 13283 5503
True negative rate - 29%
Negative predictive value - 33%
However when I'm drawing ROC curve I get following plot
perf <- prediction(forest$predictions[,2], sample$fraud)
pred3 <- performance(perf, "tnr", "fnr")
plot(pred3, main="ROC Curve for Random Forest", col="blue", lwd=2)
abline(a=0,b=1,lwd=2,lty=2,col="gray")
I can't understand why my prediction is starting to perform only after 50% of decision interval. Do you guys have a clue or any previous experience?
Upvotes: 1
Views: 826
Reputation: 322
We normally plot the True positive rate and False positive rate in ROC curve... but you have the TRUE negative and false negative. Maybe that's why.
Upvotes: 1