Jack Landry
Jack Landry

Reputation: 148

Tidyeval and apply family to add new variables to a dataframe

I'm trying to write a function to automate the creation of some new variables using tidyverse tools. I figured out my problem involves tidyeval, but I haven't quite figured out where I went wrong in the code below, which is just reproducing the variable name. As a second step, I'd like to do something besides a for loop to apply the function a bunch of times. I've read enough StackOverflow answers shaming for loops, but I can't find a worked example for using some kind of apply function creating new variables on an existing dataframe. Thanks!

library(tidyverse)
x = c(0,1,2,3,4)
y = c(0,2,4,5,8)
df <- data.frame(x,y)
df
simple_func <- function(x) {
  var_name <- paste0("pre_", x, "_months")
  var_name <-  enquo(var_name)
  df <- df %>%
    mutate(!! var_name := ifelse(x==y,1,0)) %>%
    mutate(!! var_name := replace_na(!! var_name))
  return(df)
}
simple_func(1)
#Desired result
temp <- data.frame("pre_1_months" = c(1,0,0,0,0))
temp
bind_cols(df,temp)

#Step 2, use some kind of apply function rather than a loop to apply this function sequentially
nums <- seq(1:10)
for (i in seq_along(nums)) {
  df <- simple_func(nums[i])
}
df

Upvotes: 0

Views: 72

Answers (2)

Allan Cameron
Allan Cameron

Reputation: 173858

To build on @akrun's answer, the more idiomatic way to do this would be to pass df as the first parameter of your function, and have x as the second. You can vectorize the function by putting the loop inside it to run once for each element in x by using rlang::syms instead of sym. It also makes the code shorter, and you can add it into the pipe as if it was a dplyr function.

simple_func <- function(df, x) 
{
    for(var_name in rlang::syms(paste0("pre_", x, "_months")))
    {
      df <- mutate(df, !! var_name := replace_na(ifelse(x==y,1,0)))
    }
    df
}

So now you can do:

df %>% simple_fun(1:5)
#>   x y pre_1_months pre_2_months pre_3_months pre_4_months pre_5_months
#> 1 0 0            1            1            1            1            1
#> 2 1 2            0            0            0            0            0
#> 3 2 4            0            0            0            0            0
#> 4 3 5            0            0            0            0            0
#> 5 4 8            0            0            0            0            0

EDIT

Following the comment from Lionel Henry, and also from noting the OPs desire to avoid loops, here is a single function without loops that can be used in the pipe with x of an arbitrary length, and which doesn't rely on converting to symbols:

simple_func <- function(df, x) {
  f <- function(v) df <<- mutate(df, !!v := replace_na(ifelse(x == y, 1, 0)))
  lapply(paste0("pre_", x, "_months"), f)
  return(df)
}

This works the same way:

df %>% simple_fun(1:10)
#>   x y pre_1_months pre_2_months pre_3_months pre_4_months pre_5_months pre_6_months
#> 1 0 0            1            1            1            1            1            1
#> 2 1 2            0            0            0            0            0            0
#> 3 2 4            0            0            0            0            0            0
#> 4 3 5            0            0            0            0            0            0
#> 5 4 8            0            0            0            0            0            0
#>   pre_7_months pre_8_months pre_9_months pre_10_months
#> 1            1            1            1             1
#> 2            0            0            0             0
#> 3            0            0            0             0
#> 4            0            0            0             0
#> 5            0            0            0             0

Upvotes: 1

akrun
akrun

Reputation: 887148

As it is a string, we can use sym to convert to symbol and then evaluate (!!

simple_func <- function(x) {
    var_name <- paste0("pre_", x, "_months")
    var_name <-  rlang::sym(var_name)
    df %>%
      mutate(!! var_name := ifelse(x==y,1,0)) %>%
      mutate(!! var_name := replace_na(!! var_name))

    }

checking with OP's code

nums <- seq(1:10)
for (i in seq_along(nums)) {
   df <- simple_func(nums[i])
 }

df
#  x y pre_1_months pre_2_months pre_3_months pre_4_months pre_5_months pre_6_months pre_7_months pre_8_months
#1 0 0            1            1            1            1            1            1            1            1
#2 1 2            0            0            0            0            0            0            0            0
#3 2 4            0            0            0            0            0            0            0            0
#4 3 5            0            0            0            0            0            0            0            0
#5 4 8            0            0            0            0            0            0            0            0
#  pre_9_months pre_10_months
#1            1             1
#2            0             0
#3            0             0
#4            0             0
#5            0             0

We could use map and change the mutate to transmute

simple_func <- function(x) {
    var_name <- paste0("pre_", x, "_months")
    var_name <-  rlang::sym(var_name)
    df %>%
      transmute(!! var_name := ifelse(x==y,1,0)) %>%
      transmute(!! var_name := replace_na(!! var_name))

    }

library(purrr)
library(dplyr)
map_dfc(1:10, simple_func) %>% 
       bind_cols(df,.)

Upvotes: 1

Related Questions