David
David

Reputation: 41

Map with Purrr multiple dataframes and have those modified dataframes as the output

I've got a question with the map function from the Purrr package.

As an example with the mtcars dataset:

#I create a second df
mtcars2 <- mtcars 

#change one variable just to distinguish them 
mtcars2$mpg <- mtcars2$mpg / 2

#create the list
dflist <- list(mtcars,mtcars2)

#then, a simple function example
my_fun <- function(x) 

{x <- x %>%

    summarise(`sum of mpg` = sum(mpg), 
              `sum of cyl` = sum(cyl)
    ) 
}

#then, using map, this works and prints the desired results
list_results <- map(dflist,my_fun)

But, I would need to have the modified mtcars and mtcars2 saved as r objects (dataframes).

In advance, thanks a lot to all of you !

Upvotes: 1

Views: 2528

Answers (2)

Brad Cannell
Brad Cannell

Reputation: 3200

Here is solution using purrr::walk() with get() and assign(). Similar to those above, but not identical.

library(dplyr)
library(purrr)
data(mtcars)

Create the second data frame.

mtcars2 <- mtcars
mtcars2$mpg <- mtcars2$mpg / 2

Create the function to apply to each data frame.

sum_mpg_cyl <- function(.data) {
  .data %>% 
    summarise(
      `sum of mpg` = sum(mpg),
      `sum of cyl` = sum(cyl)
    )
}

Apply sum_mpg_cyl() to mtcars and mtcars2, saving two data frames of summary stats by the same names to the global environment. A potential advantage of this method is that you do not need to create a separate list of data frames.

walk(
  .x = c("mtcars", "mtcars2"),
  .f = function(df_name) {
    # Get the data frame from the global environment
    df <- get(df_name, envir = .GlobalEnv)
    
    # Calculate the summary statistics
    df <- sum_mpg_cyl(df)
    
    # Save the data frames containing summary statistics back to the global 
    # environment
    assign(df_name, df, envir = .GlobalEnv)
  }
)

I would probably also use an anonymous function and save the two data frames of summary stats with different names like this:

# Reset the data
data(mtcars)
mtcars2 <- mtcars
mtcars2$mpg <- mtcars2$mpg / 2
walk(
  .x = c("mtcars", "mtcars2"),
  .f = function(df_name) {
    # Get the data frame from the global environment
    df <- get(df_name, envir = .GlobalEnv)
    
    # Calculate the summary statistics
    df <- df %>% 
      summarise(
        `sum of mpg` = sum(mpg),
        `sum of cyl` = sum(cyl)
      )
    
    # Rename the data frames containing summary statistics to distinguish
    # them from the input data frames
    new_df_name <- paste(df_name, "stats", sep = "_")
    
    # Save the data frames containing summary statistics back to the global 
    # environment
    assign(new_df_name, df, envir = .GlobalEnv)
  }
)

Upvotes: 0

missuse
missuse

Reputation: 19716

Here is an attempt:

library(purrr)
library(tidyverse)

mtcars2 <- mtcars 
mtcars2$mpg <- mtcars2$mpg / 2
dflist <- list(mtcars,mtcars2)

To save the objects one would need to give them specific names, and use:

assign("name", object, envir = .GlobalEnv)

here is one way to achieve that:

my_fun <- function(x, list) {
  listi <- list[[x]]
  assign(paste0("object_from_function_", x), dflist[[x]], envir = .GlobalEnv)
  x <- listi %>%
    summarise(`sum of mpg` = sum(mpg), 
              `sum of cyl` = sum(cyl)
    )
  return(x)
}

my_fun has two arguments - seq_along(list) to generate specific names and the list that is to be processed

this saves two objects object_from_function_1 and object_from_function_2:

list_results <- map(seq_along(dflist), my_fun, dflist)

another approach would be to use list2env outside of the map function as akrun suggested

dflist <- list(mtcars,mtcars2)
names(dflist) <- c("mtcars","mtcars2")
list2env(dflist, envir = .GlobalEnv) #this will create two objects `mtcars` and `mtcars2`

and run map after you have created the objects as you have already done.

Upvotes: 4

Related Questions