Reputation: 171
I have a data frame where each Var1, Var2, and Var3 contain questions (Var1 is not all Q1, it is a mix of Q1, Q2, Q3, etc.) and a1, a2, and a3 are responses associated with Var1-3. E.g.
have <- tribble(
~Var1,~a1,~Var2,~a2,~Var3,~a3,
#---|----|----|----|----|----|
"Q1", "R1","Q3","R3","Q4","R4",
"Q3", "R3","Q1","R1","Q3","R3",
"Q3", "R3","Q1","R1","Q2","R3",
"Q2", "R2","Q4","R4","Q1","R1"
)
I want to loop through variables var1-3, and where var1-3 = Q1, I want its associated response printed out in a new column.
want <- tribble(
~Var1,~a1,~Var2,~a2,~Var3,~a3,~new_col,
#---|----|----|----|----|----|----|
"Q1", "R1","Q3","R3","Q4","R4","R1",
"Q3", "R3","Q1","R1","Q3","R3","R1",
"Q3", "R3","Q1","R1","Q2","R3","R1",
"Q2", "R2","Q4","R4","Q1","R1","R1"
)
The below code does what I want but is there a more efficient way using a loop since I would have to replicate this code multiple times with the different questions? For example, if I also wanted to do the same thing w/ Q2 and Q3, using the code below, I would have to create two more variables, one for each question to ensure that the correct attribute was assigned to the respective columns.
want <- have %>%
mutate(new_col = case_when(
Var1 == "Q1" ~ a1,
Var2 == "Q1" ~ a2,
Var3 == "Q1" ~ a3))
Upvotes: 2
Views: 72
Reputation: 206546
Well, it might be easier to tackle the problem in a different way. you can reshape your data with dplyr
and tidyr
have %>%
mutate(rowid=row_number()) %>%
pivot_longer(-rowid, names_pattern="(\\w+)(\\d+)", names_to = c(".value", "seq"))
which gives
rowid seq Var a
<int> <chr> <chr> <chr>
1 1 1 Q1 R1
2 1 2 Q3 R3
3 1 3 Q4 R4
4 2 1 Q3 R3
5 2 2 Q1 R1
6 2 3 Q3 R3
7 3 1 Q3 R3
8 3 2 Q1 R1
9 3 3 Q2 R3
10 4 1 Q2 R2
11 4 2 Q4 R4
12 4 3 Q1 R1
So you can now more easily query based on question. For example
have %>%
mutate(rowid=row_number()) %>%
pivot_longer(-rowid, names_pattern="(\\w+)(\\d+)", names_to = c(".value", "seq")) %>%
filter(Var=="Q1")
results in
rowid seq Var a
<int> <chr> <chr> <chr>
1 1 1 Q1 R1
2 2 2 Q1 R1
3 3 2 Q1 R1
4 4 3 Q1 R1
So you can see that all the Q1 values correspond to R1 and you can see which position in the sequence each occurred (1, 2, 2, 3).
If you wanted to reorganize by question you could do something like
have %>%
mutate(rowid=row_number()) %>%
pivot_longer(-rowid, names_pattern="(\\w+)(\\d+)", names_to = c(".value", "seq")) %>%
select(-seq) %>%
unique() %>%
pivot_wider(rowid, names_from=Var, values_from=a)
Note that we had to put in a unique()
because row 2 had Q3 twice. You'd have to decide exactly how you would want to handle that if that occurs in your real data
rowid Q1 Q3 Q4 Q2
<int> <chr> <chr> <chr> <chr>
1 1 R1 R3 R4 NA
2 2 R1 R3 NA NA
3 3 R1 R3 NA R3
4 4 R1 NA R4 R2
But here it may be easier to see where odd values occur. For example the R3 in Q2 for row 3.
Upvotes: 1