Reputation: 955
This questions concerns operation in the language of tidyverse
. I am trying to use tidyr::nest
and purrr:map2
to perform a bivariate function on two columns of a tibble
, replacing them with two other columns which are the result of that bivariate function. The operation is that of computing ROC based on values of the statistic under H0
and H1
which produces two new values (i.e. columns) FPR
and TPR
. Here is a working example:
library(tidyverse)
library(purrr)
# function to compute the rejection rates
get_reject_freq <- function(Tstat, th_vec, twosided=T) {
# Tstat is a vector, th could be a vector of thresholds threshold
if (twosided) Tstat <- abs(Tstat)
sapply(th_vec, function(th) mean(Tstat > th))
}
# function to compute the ROC
get_ROC <- function(T0, T1, twosided=T) {
T0_sorted <- sort(unique(T0), decreasing = T)
tibble(FPR = get_reject_freq(T0, T0_sorted, twosided = twosided),
TPR = get_reject_freq(T1, T0_sorted, twosided = twosided))
}
n = m = 15
run_sims_one_iter <- function(j) {
x = rt(n, df=5, ncp=0)
y = list(H0=rt(m, df=5, ncp=0), H1=rt(m, df=5, ncp=1))
result = NULL
for (h in c("H0","H1")) {
result[[h]] = tibble(method="t_test", H=h,
test_stat=t.test(x,y[[h]])$statistic) %>%
add_row(method="wilcoxon", H=h,
test_stat=wilcox.test(x,y[[h]], alternative = "two.sided")$statistic, )
}
return( bind_rows(result) )
}
result = bind_rows( lapply(1:100, run_sims_one_iter) )
#### The following can hopefully be improved ###
temp = result %>%
group_by(method,H) %>%
nest() %>%
pivot_wider(names_from = H, values_from = data) %>%
ungroup()
roc_results = bind_rows(
lapply(1:nrow(temp), function(i) {
get_ROC( temp[[i,"H0"]]$test_stat, temp[[i,"H1"]]$test_stat) %>%
add_column(method = temp[i,]$method)
}
))
The line
temp = result %>%
group_by(method,H) %>%
nest() %>%
pivot_wider(names_from = H, values_from = data) %>%
ungroup()
produces an output of the form:
# A tibble: 2 x 3
method H0 H1
<chr> <list> <list>
1 t_test <tibble [100 × 1]> <tibble [100 × 1]>
2 wilcoxon <tibble [100 × 1]> <tibble [100 × 1]>
The code should operate on each row taking the two tibbles in H0
and H1
columns, passing them through the get_ROC
function and replacing them with FPR
and TPR
columns and then unnest
everything. The desired roc_result
generated by the above code is
roc_results
# A tibble: 157 x 3
FPR TPR method
<dbl> <dbl> <chr>
1 0.03 0.76 t_test
2 0.04 0.77 t_test
3 0.07 0.82 t_test
...
Ideally, I would like to replace the construction of temp
and roc_results
with a single line of the form:
temp = result %>%
group_by(method,H) %>%
nest() %>%
pivot_wider(names_from = H, values_from = data) %>%
ungroup() %>%
mutate(res=map2(unlist(H0), unlist(H1), get_ROC)) %>% unnest(res)
But this doesn't work. I guess the issue might be that the size of the output of get_ROC
could change for every row (?). Any idea how I can perform all the operations using the tidyverse
approach.
Upvotes: 0
Views: 1042
Reputation: 389325
You were in the right direction but you had to unlist
in the function of map2
instead of in the arguments.
library(dplyr)
library(tidyr)
result %>%
group_by(method,H) %>%
nest() %>%
pivot_wider(names_from = H, values_from = data) %>%
mutate(res = purrr::map2(H0, H1, ~get_ROC(unlist(.x), unlist(.y)))) %>%
unnest(res) %>%
select(-c(H0, H1))
# method FPR TPR
# <chr> <dbl> <dbl>
# 1 t_test 0.01 0.49
# 2 t_test 0.06 0.59
# 3 t_test 0.08 0.65
# 4 t_test 0.1 0.74
# 5 t_test 0.11 0.77
# 6 t_test 0.13 0.82
# 7 t_test 0.19 0.84
# 8 t_test 0.21 0.84
# 9 t_test 0.22 0.85
#10 t_test 0.24 0.86
# … with 156 more rows
Upvotes: 1