Using purrr to extract values from nested dataframe based on condition

Question

I'm working with a set of patient test results some of which are positive and negative. I'm reducing to individual patient level using dplyr::nest() and then extracting values for the first positive test only using purrr::map() and a function I've written. My dataset isn't huge - ~40k unique patients, ~110k test results - but I gave up running my script after 40 mins. I'm sure there's a better way of extracting these values but am struggling to work it out. The code chunk below illustrates the method I'm using (though obviously this runs in no time).

library(tidyverse)

example_data <- tribble(
  
  ~patient, ~is_first_positive, ~score_1, ~score_2,
  "A", F, 10, 45,
  "A", T, 16, 76,
  "A", F, 24, 86,
  "B", T, 17, 5,
  "B", F, 24, 22,
  "B", F, 55, 97,
  "C", F, 67, 48,
  "C", F, 23, 38,
  "C", F, 45, 16
  
)

example_data <- example_data %>% 
  group_by(patient) %>% 
  nest()

# function to extract values based on value of another column
get_field <- function(df, logical_field, rtn_field) {
  
  df <- df %>% filter_(logical_field)
  
  if(nrow(df)==0) {
    return(NA_integer_)
  } else {
    df %>% pull({{rtn_field}}) %>% as.integer() %>% return()
  }

}

# Use purrr to run function against each nested df
example_data <- example_data %>% 
  mutate(first_positive_score1 = map_int(data, ~get_field(., "is_first_positive", score_1)),
         first_positive_score2 = map_int(data, ~get_field(., "is_first_positive", score_2)))

rjen · Accepted Answer

If you can forgive the long lines, you can use map() in the following way.

library(dplyr)
library(tibble)
library(purrr)

example_data %>%
  mutate(score_1 = as.double(map(data, ~ deframe(.x[2])[which(deframe(.x[1]) == TRUE)])),
         score_2 = as.double(map(data, ~ deframe(.x[3])[which(deframe(.x[1]) == TRUE)]))) 

#   patient data             score_1 score_2
#                      
# 1 A             16      76
# 2 B             17       5
# 3 C             NA      NA

Using purrr to extract values from nested dataframe based on condition

Answers (2)

Related Questions