user22062
user22062

Reputation: 187

How to calculate average sensitivity and specificity at specified cutoff in ROCR package?

I use ROCR package to draw the ROC curve. The code is as follows:

pred <- prediction(my.pred, my.label)
perf <- performance(my.pred, 'tpr', 'fpr')
plot(perf,avg="threshold")

My pred and perf object is not a vector but a list, so I can get an average ROC curve. Can anyone tell me how to calculate average sensitivity and specificity at a specified cutoff in ROCR package?

Upvotes: 2

Views: 7267

Answers (1)

Boris Gorelik
Boris Gorelik

Reputation: 31777

Actually, ROCR is an overkill for this task. The performance function of ROCR returns performance metrics at every score that is present in its input. So, theoretically you could do the following:

library(ROCR)
set.seed(123)
N <- 1000
POSITIVE_CASE <- 'case A'
NEGATIVE_CASE <- 'case B'
CUTOFF <- 0.456

scores <- rnorm(n=N)
labels <- ifelse(runif(N) > 0.5, POSITIVE_CASE, NEGATIVE_CASE)



pred <- prediction(scores, labels)
perf <- performance(pred, 'sens', 'spec')

At this point perf contains a lot of useful information:

  > str(perf)
  Formal class 'performance' [package "ROCR"] with 6 slots
  ..@ x.name      : chr "Specificity"
  ..@ y.name      : chr "Sensitivity"
  ..@ alpha.name  : chr "Cutoff"
  ..@ x.values    :List of 1
  .. ..$ : num [1:1001] 1 1 0.998 0.996 0.996 ...
  ..@ y.values    :List of 1
  .. ..$ : num [1:1001] 0 0.00202 0.00202 0.00202 0.00405 ...
  ..@ alpha.values:List of 1
  .. ..$ : num [1:1001] Inf 3.24 2.69 2.68 2.58 ...

Now you can search for your score cut-off in [email protected] and find the corresponding sensitivity and specificity values. If you don't find the exact cut-off value in [email protected], you'll have to do some interpolation:

ix <- which.min(abs([email protected][[1]] - CUTOFF)) #good enough in our case
sensitivity <- [email protected][[1]][ix] #note the order of arguments to `perfomance` and of x and y in `perf`
specificity <- [email protected][[1]][ix]

Which gives you:

> sensitivity
[1] 0.3319838
> specificity
[1] 0.6956522

But there is a much simpler and faster way: just convert your label string to a binary vector and calculate the metrics directly:

binary.labels <- labels == POSITIVE_CASE
tp <- sum( (scores > threshold) & binary.labels )
sensitivity <- tp / sum(binary.labels)
tn <- sum( (scores <= threshold) & (! binary.labels))
specificity <- tn / sum(!binary.labels)

Which gives you:

> sensitivity
[1] 0.3319838
> specificity
[1] 0.6956522

Upvotes: 4

Related Questions