Reputation: 574
To plot roc curve:
library(ROCR)
<data cleaning/scrubbing>
<train data>
.....
.....
rf.perf = performance(rf.prediction, "tpr", "fpr") #for RF
logit.perf = performance (logit.prediction, "tpr", "fpr") #for logistic reg
tree.perf = performance(tree.prediction, "tpr", "fpr") #for cart tree
...
plot(re.perf) #a RF roc curve
If I want to run a xgboost
classification and subsequently plotting roc:
objective = "binary:logistics"
I'm confused with the xgboost's arguments metrics "auc" (page 9 of the CRAN manual), it says area. How does one plot the curve with tpr and fpr for model comparison?
I tried search the net and github, most emphasis on feature importance graph (for xgboost
).
Thanks
Upvotes: 4
Views: 12230
Reputation: 1669
Let me first talk about ROC curve
The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings.
In python it can be done easily as:
from sklearn import metrics
def buildROC(target_test,test_preds):
fpr, tpr, threshold = metrics.roc_curve(target_test, test_preds)
roc_auc = metrics.auc(fpr, tpr)
plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.gcf().savefig('roc.png')
For example in above image, at certain threshold and at cost of false positive rate 0.2, we can get true positive nearly 0.96 - 0.97
Upvotes: 4