xpotion
xpotion

Reputation: 47

Plot a ROC curve in R without using any packages

Hi i am pretty new to programming in R and i am having troble plotting a ROC curve without using any package.

I generated my data using:

d=rpearsonIII(100,0.5,360,20)
nd=rnorm(100,450,25)

i need a vector with values <400 for d and >400 for nd, so i did:

spec = (cumsum(nd[nd>400])/sum(nd))*100
sens = (cumsum(d[d<400])/sum(nd))*100

and the i plotted like this:

plot(1-spec,sens)

but the plot was nothing like i expected it to be

Edit: Thanks to the advice given my code looks like this now:

sc2 = c(rnorm(50,450,25),rpearsonIII(50,0.5,360,20))
scF = sc2 < 395

thresholds <- sort(sc2)

pos <- sum(scF);pos
neg <- sum(!scF);neg

tn <- cumsum(!scF);tn
spec <- tn/neg;spec

tp <- pos - cumsum(scF);tp
sens <- tp/pos;sens

plot(1 - spec, sens, type = "l", col = "red", 
     ylab = "Sensitivity", xlab = "1 - Specificity")
abline(c(0,0),c(1,1))

The plotted roc curve looks like this: roc curve

My problem now is that if change the order of the generated data (rnorm and rpearsonIII), the curve is reversed.

Upvotes: 3

Views: 3869

Answers (1)

Barker
Barker

Reputation: 2094

I don't know what rpearsonIII is, so I am just going to make a sample random data with actual classes actuals as well as the scores for the predictions scores.

set.seed(100)
actuals <- sample(c(TRUE,FALSE), 100, replace = TRUE)
scores <- runif(100,-1,1)

The long version with explanation

If in your data the actuals are strings or factors rather than logicals, you will need to convert them to logicals using:

actuals <- actuals == "postiveClass"

Next we want to order the instances based on their scores. We can do this using:

actuals <- actuals[order(scores)]

If you want to keep track of the thresholds for the sensitivities and specificity, you can keep them aligned using:

thresholds <- sort(scores)

Now we need to get our sensitivities and specificities. Sensitivity is TP/P and specificity is TN/N. Getting the total number of positives P is easy, since our actuals are logical, we can just use sum(actuals). Similarity, we can get our negatives N using sum(!actuals).

pos <- sum(actuals)
neg <- sum(!actuals)

First lets get our true negatives at each threshold. That is pretty easy, it is just the number of FALSE values at or below each the threshold. Since our data are in order by threshold, we can calculate that (and the specificity) using:

tn <- cumsum(!actuals)
spec <- tn/neg

The number of true positives is slightly harder because we are looking for the number of positives greater that the threshold, so cumsum alone won't work. However, since the number above the threshold is equal to the total minus number below or at the threshold, we can get our true positives using:

tp <- pos - cumsum(actuals)
sens <- tp/pos

Now all we need to do is plot the two.

plot(1 - spec, sens, type = "l", col = "red", 
     ylab = "Sensitivity", xlab = "1 - Specificity")
abline(c(0,0),c(1,1))

ROC Plot

To get the AUC of the curve, we simply need to calculate the height of the curve (the sensitivity) multiplied by the width the (difference in 1 - specificity) at each value of actuals. We already have the sensitivity, we just need the specificity. The diff function will give us our difference in adjacent values of specificity, however, we need to put a 0 value at the beginning to get the width of the first columns.

width <- diff(c(0, 1 - sens))
auc <- sum(spec*width)

the minimal code version

actuals <- actuals[order(scores)]

sens <- (sum(actuals) - cumsum(actuals))/sum(actuals)
spec <- cumsum(!actuals)/sum(!actuals)

plot(1 - spec, sens, type = "l", col = "red", 
     ylab = "Sensitivity", xlab = "1 - Specificity")
abline(c(0,0),c(1,1))

(auc <- sum(spec*diff(c(0, 1 - sens))))

Upvotes: 6

Related Questions