daisy_speed_713
daisy_speed_713

Reputation: 43

How to use purrr when input to function is dataframe name

I have a function that has three inputs: df name x, df name y, and term, the function looks something like:

function(df_x, df_y, term) {
list_ids <- df_x %>%
left_join(df_y, by = c("idx" = "idy")) %>%
distinct() %>%
pull(variable) 

another_function(a = list_ids, b = term, other_parameters = T)
 
}

I have a tibble containing df_x, df_y, and terms:

df_x       df_y        term
<chr>      <chr>       <chr>
df_123a    df_123b     term_m                       
df_123a    df_123b     term_n   
df_456a    df_456b     term_m   
df_456a    df_456b     term_n
df_789a    df_789a     term_m
df_789a    df_789b     term_n 

I want to use the map function to run the function using the elements of this tibble. However, the df names are stored as characters and the function contains a left join of the df_x and df_y. How can I accomplish this?

Upvotes: 0

Views: 438

Answers (3)

mnist
mnist

Reputation: 6954

You can use pmap and get to retrieve the objects from the environment by name

library(dplyr)

# tibble with information
match_tib <- tibble(df_1 = c("df1", "df2"),
                    df_2 = c("df2", "df1"),
                    temp = 1:2)

# dfs to join
df1 <- tibble(a = 1:2,
              b = 10)
df2 <- tibble(a = 2:3,
              c = 20)

# run pmap
purrr::pmap(match_tib, function(df_1, df_2, temp) {
  df_new <- get(df_1) %>% 
    left_join(get(df_2), by = "a")
  df_new %>% mutate(x = temp)
})
#> [[1]]
#> # A tibble: 2 x 4
#>       a     b     c     x
#>   <int> <dbl> <dbl> <int>
#> 1     1    10    NA     1
#> 2     2    10    20     1


#> [[2]]
#> # A tibble: 2 x 4
#>       a     c     b     x
#>   <int> <dbl> <dbl> <int>
#> 1     2    20    10     2
#> 2     3    20    NA     2

Upvotes: 2

Arun Chavan
Arun Chavan

Reputation: 31

Surrounding the names of your data.frames with get() should work, I think, where your map() call will look something like this:

purrr::map(
  .x = 1:nrow(my_tibble),
  .f = ~ my_function(df_x = get(my_tibble$df_x[.x]),
                     df_y = get(my_tibble$df_y[.x]),
                     term = my_tibble$term[.x])
)

Upvotes: 0

Andy Baxter
Andy Baxter

Reputation: 7646

A base way of doing it is using eval(parse(text = 'string')) to make a string call an object name. Here's an example making a function to bind rows of dfs called by strings in a data frame:

library(tidyverse)

df1 <- tibble(x = "a", y = 1)
df2 <- tibble(x = "b", y = 2)

dfs_to_process <- tibble(dfa = "df1", dfb = "df2")

join_dfs <- function(df1_name, df2_name) {
  
  eval(parse(text = df1_name)) %>% 
    bind_rows(eval(parse(text = df2_name)))
  
}

dfs_to_process %>% 
  mutate(result = map2(dfa, dfb, join_dfs)) %>% 
  pull(result)
#> [[1]]
#> # A tibble: 2 x 2
#>   x         y
#>   <chr> <dbl>
#> 1 a         1
#> 2 b         2

Created on 2021-03-24 by the reprex package (v1.0.0)

Update

The get() function would make this a lot neater (thanks @mnist!):

join_dfs <- function(df1_name, df2_name) {
  get(df1_name) %>%
    bind_rows(get(df2_name))
}

Rather than edit my original response I'll leave as is, as it'd be essentially the same solution if I changed it over.

Upvotes: 0

Related Questions