how to loop over dplyr mutate

Question

I am relatively new to R programming and have been using Stata mainly over the past couple of years. I have a bunch of variables of measures taken on R and L eyes but I need to create a 'treated' variable for each measure that contains the data for the treated side (given by the side variable).

I can do this simply enough using case_when in dplyr for a single variable, but how do I set this up in a loop as I have quite a few to do.

Can you use paste0 to join strings to create new variable names on the fly?

# dummy data
side <- rep(c("R","L"), each = 5, times = 5)
R_var1 <- round(rnorm(50, 20, 5),2)
L_var1 <- round(rnorm(50, 20, 5),2)
R_var2 <- round(rnorm(50, 20, 5),2)
L_var2 <- round(rnorm(50, 20, 5),2)
R_var3 <- round(rnorm(50, 20, 5),2)
L_var3 <- round(rnorm(50, 20, 5),2)
df <- data.frame(cbind(side, R_var1, L_var1, R_var2, L_var2, R_var3, L_var3))

# create 'treated' variable in the single case
df <- df  %>%
  mutate(., var1_treated = case_when(side == "R" ~ R_var1,
                                     side == "L" ~ L_var1))
df

# create 'treated' variable over multiple variables in a loop
# this is where I am stuck

variables <- unique(substring(colnames(df)[2:7],3))

for(var in variables){
  df <- df  %>%
  mutate(., paste0(variables,"_treated") = case_when(side == "R" ~ paste0("R_",variables),
                                                     side == "L" ~ paste0("L_",variables)))
}

akrun · Accepted Answer

We could use across for this. Loop across the columns with names that starts_with 'R_var', create the condition in case_when to return the column value if 'side' value is 'R' and for 'L' - get the column value by replacing the column name (cur_column()) prefix 'R' with 'L', modify the column name of the returned column in .names by appending a suffix _treated and then finally remove the prefix ('R_' from those columns)

library(dplyr)
library(stringr)
out <- df %>%
     mutate(across(starts_with('R_var'),
       ~ case_when(side == 'R' ~ ., side == 'L' ~ 
        get(str_replace(cur_column(), '^R', 'L'))), 
       .names = '{.col}_treated')) %>%
     rename_with(~ str_remove(., '^R_'),  ends_with('_treated'))

-output

head(out)
#  side R_var1 L_var1 R_var2 L_var2 R_var3 L_var3 var1_treated var2_treated var3_treated
#1    R  18.81  17.39  21.94  17.34  15.67  19.84        18.81        21.94        15.67
#2    R  16.82  21.55  21.24     16  19.96  21.12        16.82        21.24        19.96
#3    R  20.96  11.45   7.04  19.93  22.19  13.62        20.96         7.04        22.19
#4    R  21.67  24.75  15.11  27.88   9.52  24.07        21.67        15.11         9.52
#5    R  20.86  13.77  23.07  24.45  16.81  15.24        20.86        23.07        16.81
#6    L  26.97  27.08  18.92  26.58  17.07  35.75        27.08        26.58        35.75

In the OP's code, we need to assign (:=) along with get the variable values (or use !! rlang::sym(paste0("R_", var)))

for(var in variables){
   df <- df  %>%
      mutate(!! paste0(var,"_treated") := case_when(side == "R" ~ 
         get(paste0("R_", var)),
              side == "L" ~ get(paste0("L_",var))))
     }

Or using base R with split.default into a list based on the column patterns

lst1 <-  lapply(split.default(df[-1], sub('^[RL]_', '',
   names(df)[-1])), function(x) ifelse(df$side == 'R', x[[1]], x[[2]]))
df[paste0(names(lst1), '_treated')] <- lst1

NOTE: Creating a data.frame via cbind would convert it to matrix first (as it uses cbind.matrix) and matrix can have only a single type. Because of the first column, it changes the whole columns to character. Instead, it would be

df <- data.frame(side, R_var1, L_var1, R_var2, L_var2, R_var3, L_var3)

how to loop over dplyr mutate

Answers (1)

Related Questions