Scipione Sarlo
Scipione Sarlo

Reputation: 1498

How to write multiple excel files with multiple sheets based on a variable of a split data frame in R (tidyverse)

I would like to split a data frame in order to produce multiple excel files containing multiple sheets on the basis of another variable of the original split data frame. The data frame I used is the toy data set mtcars. I split it by cyl in order to create multiple files with sheets based on the variable gear.

So, I would expect to get three excel files with the following names:

Each of them containing:

What I did it seems to overwrite the files.

This is what I did:

library(tidyverse)
library(writexl)

# I split the data frame
list_of_cars_by_cyl <- mtcars %>%
      dplyr::group_split(cyl)

# I gave names to split data frame elements
names(list_of_cars_by_cyl) <- list_of_cars_by_cyl %>%
      purrr::map(~pull(.,cyl)) %>% # individua le modalit? della variabile area
      purrr::map(~as.character(.)) %>% # converte il fattore in character
      purrr::map(~unique(.))

nomi <- names(list_of_cars_by_cyl)

# I create a function in order to save split data frames in .xlsx with several sheets based on a second variable
save_to_excel <- function(x) {
      # list by new variable
      list_of_cars_by_gear <- x %>%
            dplyr::group_split(gear)
      # name list's elements
      names(list_of_cars_by_gear) <- list_of_cars_by_gear %>%
            purrr::map(~pull(., gear)) %>% # individua le modalit? della variabile area
            purrr::map(~as.character(.)) %>% # converte il fattore in character
            purrr::map(~unique(.))
      # save to .xlsx
      list_of_cars_by_gear %>%
            writexl::write_xlsx(path = paste(cartelle[5], paste0("Cars_by_cyl_", nomi, "_", format(Sys.time(), format = "%d%m%Y_%H%M%S"), ".xlsx"), sep = "/"))
}

# run the function iteratively
list_of_cars_by_cyl %>% 
      purrr::map(save_to_excel)

Upvotes: 0

Views: 1219

Answers (1)

pholzm
pholzm

Reputation: 1784

In your example, you are using the same file name for each list element - nomi is used identically in every call to save_to_excel. There are (at least) two ways to resolve this: construct the correct file name completely inside the save_to_excel function, or simulateneously iterate over both list_of_cars_by_cyl and nomi with purrr::map2.

Both options produce the expected output (3 Excel files containing the cyl number in the name, each with the respective sheets for the gear split).

Option 1

Get the cyl part of the file name from the dataframe passed to save_to_excel:

save_to_excel <- function(x) {
      # list by new variable
      list_of_cars_by_gear <- x %>%
            dplyr::group_split(gear)
      # name list's elements
      names(list_of_cars_by_gear) <- list_of_cars_by_gear %>%
            purrr::map(~pull(., gear)) %>% # individua le modalit? della variabile area
            purrr::map(~as.character(.)) %>% # converte il fattore in character
            purrr::map(~unique(.))

      # cyl part for current file name
      this_cyl <- unique(x$cyl)
      
      # save to .xlsx
      list_of_cars_by_gear %>%
            writexl::write_xlsx(path = paste(
              #cartelle[5], #not defined example code,
              paste0("Cars_by_cyl_", this_cyl, "_", format(Sys.time(), format = "%d%m%Y_%H%M%S"), ".xlsx"), 
              sep = "/"))
}

list_of_cars_by_cyl %>% 
  purrr::map(save_to_excel)

#$`4`
#[1] "...\\Cars_by_cyl_4_25032022_220454.xlsx"
#$`6`
#[1] "...\\Cars_by_cyl_6_25032022_220454.xlsx"
#$`8`
#[1] "...\\Cars_by_cyl_8_25032022_220454.xlsx"

Option 2

Add second argument to save_to_excel and iterate over dataframes and names:

save_to_excel <- function(x, name) {
  # list by new variable
  list_of_cars_by_gear <- x %>%
    dplyr::group_split(gear)
  # name list's elements
  names(list_of_cars_by_gear) <- list_of_cars_by_gear %>%
    purrr::map(~pull(., gear)) %>% # individua le modalit? della variabile area
    purrr::map(~as.character(.)) %>% # converte il fattore in character
    purrr::map(~unique(.))
  # save to .xlsx
  list_of_cars_by_gear %>%
    writexl::write_xlsx(path =  paste(
              #cartelle[5], #not defined example code,
              paste0("Cars_by_cyl_", name, "_", format(Sys.time(), format = "%d%m%Y_%H%M%S"), ".xlsx"), 
              sep = "/"))
}

# run the function iteratively
# over dataframe list and nomi
list_of_cars_by_cyl %>% 
  purrr::map2(., nomi, save_to_excel)

#$`4`
#[1] "...\\Cars_by_cyl_4_25032022_221107.xlsx"
#$`6`
#[1] "...\\Cars_by_cyl_6_25032022_221107.xlsx"
#$`8`
#[1] "...\\Cars_by_cyl_8_25032022_221107.xlsx"

purrr options

As the question also asks about understanding the purrr logic, here are a few different ways how the last step could have been formulated instead. All produce the same result, and which is most easy to use/understand is imho a matter of personal preference.

# refer to first, second argument with .x, .y
purrr::map2(list_of_cars_by_cyl, nomi, ~save_to_excel(x = .x, name = .y))
# refer to first, second argument with ..1, ..2
purrr::map2(list_of_cars_by_cyl, nomi, ~save_to_excel(x = ..1, name = ..2))
# define anonymous function with appropriate number of arguments
purrr::map2(list_of_cars_by_cyl, nomi, function(x, y) {save_to_excel(x, y)})

Upvotes: 1

Related Questions