Reputation: 487
I have a data frame with 40 variables G1_a
, G1_b
, ... till G20_a
, G20_b
(stemming from a survey). I want to create 20 new variables G1
... G20
that summarize the existing variables.
data <- data.frame(G1_a = c(0, 0, 0, 1, NA),
G1_b = c(0, 0, 1, 1, NA),
G2_a = c(0, 0, 0, 1, NA),
G2_b = c(0, 0, 1, 1, NA))
# Reshaping without for-loop:
data <- data %>%
mutate(G1 = case_when(
G1_a == 1 ~ "own_offer",
G1_b == 1 ~ "no_offer",
T ~ NA_character_
))
data <- data %>%
mutate(G2 = case_when(
G2_a == 1 ~ "own_offer",
G2_b == 1 ~ "no_offer",
T ~ NA_character_
))
I want to automate the creation of the new variables in a for-loop, something like:
# Reshaping with for-loop:
for(i in 1:2) {
data <- data %>%
mutate(assign(paste0("G", i), case_when(
get(paste0("G", i, "_a")) == 1 ~ "own_offer",
get(paste0("G", i, "_b")) == 1 ~ "no_offer",
T ~ NA_character_
)))
}
My question includes two parts:
1) Is it possible to combine assign
with mutate
? I'm aware of approaches like mutate(df, !!varname := Petal.width * n)
(see here) to dynamically assign parameter names. However, I was unable to combine it with the data reshaping I want to run.
2) Does dplyr
allow the use of paste0
together with case_when
and mutate
?
Upvotes: 2
Views: 721
Reputation: 227001
This is a little tricky, but I think it's the principled way to do it. The final result is a data frame with the desired columns, thus avoiding all of the get()
/assign()
headaches (and not cluttering up the workspace with lots of derived variables.) There are several steps where we change the shape of the data frame (wide -> long -> partially wide -> wide) using tidyr::gather()
and tidyr::spread()
. If it seems overwhelming, experiment with stopping the pipe sequence at various intermediate points to see what has been achieved so far.
library(tidyr)
library(dplyr)
dds <- (dd
%>% mutate(case=seq(n())) ## need a variable to distinguish rows in original data set
%>% gather(var,val,-case) ## -> long format: {case, var={G1_a,G1_b,...}, val={0,1,NA}}
%>% separate(var,c("var","response")) ## split to "G1","G2" + "a", "b"
%>% spread(response,val) ## convert back to semi-wide: {case, var, a, b}
## now collapse rows to categorical value, as above
%>% mutate(offer=case_when(a==1 ~ "own_offer",
b==1 ~ "no_offer",
TRUE ~ NA_character_))
%>% select(-c(a,b)) ## clean up now-redundant variables
%>% spread(var,offer) ## convert back to wide format: {case, G1, G2, ...}
%>% select(-case) ## now redundant
)
G1 G2
1 <NA> <NA>
2 <NA> <NA>
3 no_offer no_offer
4 own_offer own_offer
5 <NA> <NA>
Upvotes: 2