Reputation: 231
I have made model that predicts late arrival of flights.I want to see the true positive rate, given a false positive rate of 50%. I can see this in a ROC curve I plot. But I want to calculate the value exactly, not just reading it from the plot. Anyone has an idea how?
library(modelr)
library(dplyr)
library(sparklyr)
library(ggplot2)
library(nycflights13)
data(flights)
RNGkind(sample.kind="Rounding")
set.seed(42)
flights <- mutate(flights, late_arrival = ifelse(arr_delay > 30, 1, 0))
spark_install()
sc <- spark_connect(master = "local")
flights_tbl <- copy_to(sc, flights, "flights")
flights_tbl <- flights_tbl %>% na.omit(flights_tbl)
partition <- flights_tbl %>%
select(late_arrival, carrier, dep_delay, month, year) %>%
sdf_random_split(train = 0.75, test = 0.25)
train_tbl <- partition$train
test_tbl <- partition$test
########### my model
ml_formula <- formula(late_arrival ~ carrier + dep_delay + month + year)
ml_log <- ml_logistic_regression(train_tbl,ml_formula)
ml_log
pred_lr <- ml_predict(ml_log, test_tbl) %>% collect
pred_lr$p1 <- unlist(pred_lr$probability)[ c(FALSE,TRUE) ]
########## my ROC curve plot
ROC_lr <- get_roc(L = pred_lr$late_arrival, f = pred_lr$p1)
ggplot(ROC_lr, aes(x = FPR, y = TPR)) + geom_line(aes(col = "my prediction")) + ggtitle("ROC curve of my prediction", "logistic regression to predict late arrivals based on carrier, departure delay, month, and year")
Upvotes: 0
Views: 1100
Reputation:
I'm not familiar with get_roc()
, but you can certainly print out the ROC_lr
to get some nearby values:
print(ROC_lr)
But you can try another package: package pROC
has function coords()
to count values from the ROC curve at some point:
library(pROC)
# only some random values for example
labels <- c(0, 1, 0, 1, 0, 0, 0, 1, 0, 0)
scores <- 1:10
# instead of get_roc() you can use pROC::roc()
roc <- roc(labels, scores)
# let's say you want FPR 0.5 and calculate TPR
fpr <- 0.5
# coords() needs specificity (1 - FPR) to calculate sensitivity (TPR)
fpr50 <- coords(roc, 1 - fpr, input = "specificity")
# get TPR from the result
tpr <- fpr50$sensitivity
tpr
#[1] 0.6666667
Upvotes: 1