E_H
E_H

Reputation: 231

How to calculate true positive rate?

I have made model that predicts late arrival of flights.I want to see the true positive rate, given a false positive rate of 50%. I can see this in a ROC curve I plot. But I want to calculate the value exactly, not just reading it from the plot. Anyone has an idea how?

library(modelr)
library(dplyr)
library(sparklyr)
library(ggplot2)
library(nycflights13)
data(flights)

RNGkind(sample.kind="Rounding")
set.seed(42)

flights <- mutate(flights, late_arrival = ifelse(arr_delay > 30, 1, 0))

spark_install()
sc <- spark_connect(master = "local")

flights_tbl <- copy_to(sc, flights, "flights")
flights_tbl <- flights_tbl %>% na.omit(flights_tbl)

partition <- flights_tbl %>% 
  select(late_arrival, carrier, dep_delay, month, year) %>%
  sdf_random_split(train = 0.75, test = 0.25)

train_tbl <- partition$train
test_tbl <- partition$test

########### my model
ml_formula <- formula(late_arrival ~ carrier + dep_delay + month + year)
ml_log <- ml_logistic_regression(train_tbl,ml_formula)
ml_log

pred_lr <- ml_predict(ml_log, test_tbl) %>% collect
pred_lr$p1 <- unlist(pred_lr$probability)[ c(FALSE,TRUE) ]

########## my ROC curve plot
ROC_lr <- get_roc(L = pred_lr$late_arrival, f = pred_lr$p1)
ggplot(ROC_lr, aes(x = FPR, y = TPR)) + geom_line(aes(col = "my prediction")) + ggtitle("ROC curve of my prediction", "logistic regression to predict late arrivals based on carrier, departure delay, month, and year")

Upvotes: 0

Views: 1100

Answers (1)

user13653858
user13653858

Reputation:

I'm not familiar with get_roc(), but you can certainly print out the ROC_lr to get some nearby values:

print(ROC_lr)

But you can try another package: package pROC has function coords() to count values from the ROC curve at some point:

library(pROC)

# only some random values for example
labels <- c(0, 1, 0, 1, 0, 0, 0, 1, 0, 0)
scores <- 1:10

# instead of get_roc() you can use pROC::roc()
roc <- roc(labels, scores)

# let's say you want FPR 0.5 and calculate TPR
fpr <- 0.5

# coords() needs specificity (1 - FPR) to calculate sensitivity (TPR)
fpr50 <- coords(roc, 1 - fpr, input = "specificity")

# get TPR from the result
tpr <- fpr50$sensitivity

tpr
#[1] 0.6666667

Upvotes: 1

Related Questions