O René
O René

Reputation: 315

from for-loops to purrr: named sublist assignment

I am finally trying to make the jump from for-loops to purrr. The basic examples are quite intuitive enough, but translating for-loops has been tricky, especially when trying to assign output objects to lists. Consider the following example where I am trying to assign the output of a nested for-loop to a named list element.

library(tidyverse)

# output list
loop_list <- list()

# iterate over colors
for (color in diamonds %>% distinct(color) %>% pull()){
  
  # sublist for each color
  loop_list[[color]] <- list()
  
  # iterate over cuts
  for(cut in diamonds %>% distinct(cut) %>% pull()){
    
    # filter data
    data <- diamonds %>% 
      filter(color == color & cut == cut)
    
    # define output
    out <- data %>% 
      pull(price) %>% 
      mean()
    
    # assign output to sublist of its color
    loop_list[[color]][[cut]] <- out
    
    # clean up filtered data set
    rm(data)
  }
}

This nested loop assigns the output object to its properly named sublist for each color of the data set. My purrr attempt creates something similar, but without the named sublists. All output objects are assigned to the same list, which is not what I'd ideally like.

grid <- expand_grid(color = diamonds %>% distinct(color) %>% pull(),
                    cut = diamonds %>% distinct(cut) %>% pull())

myfunc <- function(data, color, cut){
  
  # create output object
  out <- data %>% 
    # filter data
    filter(color == color & cut == cut) %>%  
    pull(price) %>% 
    mean()
  
  # return output
  return(out)
}

purrr_list <- grid %>%
  pmap(myfunc, data = diamonds)

Is there a way to arrive at the same output with purrr? I am aware that global assignment with <<- is a possibility, but this is generally discouraged, from what I understand.

Upvotes: 1

Views: 144

Answers (1)

akrun
akrun

Reputation: 887128

It is better to change the argument names to avoid a clash with the actual column names.

myfunc <- function(data, col1, col2){
  
  # filter data
  data <- diamonds %>% 
    filter(color == col1 & cut == col2)
  
  # define output
  out <- data %>% 
    pull(price) %>% 
    mean()
  
  # return output
  return(out)
}


grid %>% 
     pmap_dbl(~ myfunc(diamonds, ..1, ..2))
#[1] 2597.550 3538.914 3423.644 3214.652 3682.312 4451.970 5946.181 5078.533 5255.880 4685.446 4918.186 6294.592 4574.173 5103.513 4975.655
#[16] 3889.335 5216.707 4276.255 4535.390 5135.683 3374.939 4324.890 3495.750 3778.820 3827.003 3720.706 4500.742 4123.482 3872.754 4239.255
#[31] 2629.095 3631.293 3405.382 3470.467 4291.061

If we need a nested output

library(dplyr)
grid %>% 
    split(.$color) %>% 
    map(~ pmap(.x, ~ myfunc(diamonds, ..1, ..2)) %>% 
    setNames(.x$cut))

-output

#$D
#$D$Ideal
#[1] 2629.095

#$D$Premium
#[1] 3631.293

#$D$Good
#[1] 3405.382

#$D$`Very Good`
#[1] 3470.467

#$D$Fair
#[1] 4291.061


#$E
#$E$Ideal
#[1] 2597.55

#$E$Premium
#[1] 3538.914

#$E$Good
#[1] 3423.644
# ..

Upvotes: 1

Related Questions