Reputation: 1305
I am relatively new to R programming and have been using Stata mainly over the past couple of years. I have a bunch of variables of measures taken on R and L eyes but I need to create a 'treated' variable for each measure that contains the data for the treated side (given by the side variable).
I can do this simply enough using case_when in dplyr for a single variable, but how do I set this up in a loop as I have quite a few to do.
Can you use paste0 to join strings to create new variable names on the fly?
# dummy data
side <- rep(c("R","L"), each = 5, times = 5)
R_var1 <- round(rnorm(50, 20, 5),2)
L_var1 <- round(rnorm(50, 20, 5),2)
R_var2 <- round(rnorm(50, 20, 5),2)
L_var2 <- round(rnorm(50, 20, 5),2)
R_var3 <- round(rnorm(50, 20, 5),2)
L_var3 <- round(rnorm(50, 20, 5),2)
df <- data.frame(cbind(side, R_var1, L_var1, R_var2, L_var2, R_var3, L_var3))
# create 'treated' variable in the single case
df <- df %>%
mutate(., var1_treated = case_when(side == "R" ~ R_var1,
side == "L" ~ L_var1))
df
# create 'treated' variable over multiple variables in a loop
# this is where I am stuck
variables <- unique(substring(colnames(df)[2:7],3))
for(var in variables){
df <- df %>%
mutate(., paste0(variables,"_treated") = case_when(side == "R" ~ paste0("R_",variables),
side == "L" ~ paste0("L_",variables)))
}
Upvotes: 2
Views: 881
Reputation: 887901
We could use across
for this. Loop across
the columns with names that starts_with
'R_var', create the condition in case_when
to return the column value if 'side' value is 'R' and for 'L' - get
the column value by replacing the column name (cur_column()
) prefix 'R' with 'L', modify the column name of the returned column in .names
by appending a suffix _treated
and then finally remove the prefix ('R_' from those columns)
library(dplyr)
library(stringr)
out <- df %>%
mutate(across(starts_with('R_var'),
~ case_when(side == 'R' ~ ., side == 'L' ~
get(str_replace(cur_column(), '^R', 'L'))),
.names = '{.col}_treated')) %>%
rename_with(~ str_remove(., '^R_'), ends_with('_treated'))
-output
head(out)
# side R_var1 L_var1 R_var2 L_var2 R_var3 L_var3 var1_treated var2_treated var3_treated
#1 R 18.81 17.39 21.94 17.34 15.67 19.84 18.81 21.94 15.67
#2 R 16.82 21.55 21.24 16 19.96 21.12 16.82 21.24 19.96
#3 R 20.96 11.45 7.04 19.93 22.19 13.62 20.96 7.04 22.19
#4 R 21.67 24.75 15.11 27.88 9.52 24.07 21.67 15.11 9.52
#5 R 20.86 13.77 23.07 24.45 16.81 15.24 20.86 23.07 16.81
#6 L 26.97 27.08 18.92 26.58 17.07 35.75 27.08 26.58 35.75
In the OP's code, we need to assign (:=
) along with get
the variable values (or use !! rlang::sym(paste0("R_", var))
)
for(var in variables){
df <- df %>%
mutate(!! paste0(var,"_treated") := case_when(side == "R" ~
get(paste0("R_", var)),
side == "L" ~ get(paste0("L_",var))))
}
Or using base R
with split.default
into a list
based on the column patterns
lst1 <- lapply(split.default(df[-1], sub('^[RL]_', '',
names(df)[-1])), function(x) ifelse(df$side == 'R', x[[1]], x[[2]]))
df[paste0(names(lst1), '_treated')] <- lst1
NOTE: Creating a data.frame via cbind
would convert it to matrix
first (as it uses cbind.matrix
) and matrix
can have only a single type. Because of the first column, it changes the whole columns to character
. Instead, it would be
df <- data.frame(side, R_var1, L_var1, R_var2, L_var2, R_var3, L_var3)
Upvotes: 3