passerby51
passerby51

Reputation: 955

Applying purrr::map2 on two columns of nested tibble

This questions concerns operation in the language of tidyverse. I am trying to use tidyr::nest and purrr:map2 to perform a bivariate function on two columns of a tibble, replacing them with two other columns which are the result of that bivariate function. The operation is that of computing ROC based on values of the statistic under H0 and H1 which produces two new values (i.e. columns) FPR and TPR. Here is a working example:

library(tidyverse)
library(purrr)
# function to compute the rejection rates
get_reject_freq <- function(Tstat, th_vec, twosided=T) {
  # Tstat is a vector, th could be a vector of thresholds threshold
  if (twosided) Tstat <- abs(Tstat)
  sapply(th_vec, function(th) mean(Tstat > th))
}

# function to compute the ROC
get_ROC <- function(T0, T1, twosided=T) {
  T0_sorted <- sort(unique(T0), decreasing = T)
  tibble(FPR = get_reject_freq(T0, T0_sorted, twosided = twosided), 
         TPR = get_reject_freq(T1, T0_sorted, twosided = twosided))
}

n = m = 15
run_sims_one_iter <- function(j) {
  x = rt(n, df=5, ncp=0)
  y = list(H0=rt(m, df=5, ncp=0), H1=rt(m, df=5, ncp=1))

  result = NULL
  for (h in c("H0","H1")) {
    result[[h]] = tibble(method="t_test", H=h, 
                         test_stat=t.test(x,y[[h]])$statistic) %>% 
      add_row(method="wilcoxon", H=h, 
              test_stat=wilcox.test(x,y[[h]], alternative = "two.sided")$statistic, )
  }
  return( bind_rows(result) )
}

result = bind_rows( lapply(1:100, run_sims_one_iter) )


#### The following can hopefully be improved ###
temp = result %>% 
  group_by(method,H) %>% 
  nest() %>%
  pivot_wider(names_from = H, values_from = data) %>%
  ungroup()


roc_results = bind_rows( 
  lapply(1:nrow(temp), function(i) {
    get_ROC( temp[[i,"H0"]]$test_stat, temp[[i,"H1"]]$test_stat) %>% 
      add_column(method = temp[i,]$method)
  }
))

The line

temp = result %>% 
  group_by(method,H) %>% 
  nest() %>%
  pivot_wider(names_from = H, values_from = data) %>%
  ungroup()

produces an output of the form:

# A tibble: 2 x 3
  method   H0                 H1                
  <chr>    <list>             <list>            
1 t_test   <tibble [100 × 1]> <tibble [100 × 1]>
2 wilcoxon <tibble [100 × 1]> <tibble [100 × 1]>

The code should operate on each row taking the two tibbles in H0 and H1 columns, passing them through the get_ROC function and replacing them with FPR and TPR columns and then unnest everything. The desired roc_result generated by the above code is

 roc_results
# A tibble: 157 x 3
     FPR   TPR method
   <dbl> <dbl> <chr> 
 1  0.03  0.76 t_test
 2  0.04  0.77 t_test
 3  0.07  0.82 t_test
...

Ideally, I would like to replace the construction of temp and roc_results with a single line of the form:

temp = result %>% 
  group_by(method,H) %>% 
  nest() %>%
  pivot_wider(names_from = H, values_from = data) %>%
  ungroup() %>%
  mutate(res=map2(unlist(H0), unlist(H1), get_ROC)) %>% unnest(res)

But this doesn't work. I guess the issue might be that the size of the output of get_ROC could change for every row (?). Any idea how I can perform all the operations using the tidyverse approach.

Upvotes: 0

Views: 1042

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389325

You were in the right direction but you had to unlist in the function of map2 instead of in the arguments.

library(dplyr)
library(tidyr)

result %>% 
  group_by(method,H) %>% 
  nest() %>%
  pivot_wider(names_from = H, values_from = data) %>% 
  mutate(res = purrr::map2(H0, H1, ~get_ROC(unlist(.x), unlist(.y)))) %>%
  unnest(res) %>%
  select(-c(H0, H1))

#  method   FPR   TPR
#   <chr>  <dbl> <dbl>
# 1 t_test  0.01  0.49
# 2 t_test  0.06  0.59
# 3 t_test  0.08  0.65
# 4 t_test  0.1   0.74
# 5 t_test  0.11  0.77
# 6 t_test  0.13  0.82
# 7 t_test  0.19  0.84
# 8 t_test  0.21  0.84
# 9 t_test  0.22  0.85
#10 t_test  0.24  0.86
# … with 156 more rows

Upvotes: 1

Related Questions