beavis11111
beavis11111

Reputation: 574

R: xgboost plot roc curve

To plot roc curve:

library(ROCR)
<data cleaning/scrubbing>
<train data>
.....
.....
rf.perf = performance(rf.prediction, "tpr", "fpr") #for RF
logit.perf = performance (logit.prediction, "tpr", "fpr") #for logistic reg
tree.perf = performance(tree.prediction, "tpr", "fpr") #for cart tree
...
plot(re.perf) #a RF roc curve

If I want to run a xgboost classification and subsequently plotting roc: objective = "binary:logistics"

I'm confused with the xgboost's arguments metrics "auc" (page 9 of the CRAN manual), it says area. How does one plot the curve with tpr and fpr for model comparison?

I tried search the net and github, most emphasis on feature importance graph (for xgboost).

Thanks

Upvotes: 4

Views: 12230

Answers (1)

Tushar Gupta
Tushar Gupta

Reputation: 1669

Let me first talk about ROC curve

The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings.

In python it can be done easily as:

from sklearn import metrics
def buildROC(target_test,test_preds):
    fpr, tpr, threshold = metrics.roc_curve(target_test, test_preds)
    roc_auc = metrics.auc(fpr, tpr)
    plt.title('Receiver Operating Characteristic')
    plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)
    plt.legend(loc = 'lower right')
    plt.plot([0, 1], [0, 1],'r--')
    plt.ylabel('True Positive Rate')
    plt.xlabel('False Positive Rate')
    plt.gcf().savefig('roc.png')

enter image description here

For example in above image, at certain threshold and at cost of false positive rate 0.2, we can get true positive nearly 0.96 - 0.97

A good documentation on ROC

Upvotes: 4

Related Questions