Reputation: 123
I have a dataset in 'wide' format that I would like to convert to a non-standard long format. At least, that is how I would characterize this problem.
The original dataset mimics the following:
d1 <- data.frame('id' = c(1,2),
'Q1' = c(2,3),
'Q2' = c(1,3),
'Q3' = c(3,1))
d1
id Q1 Q2 Q3
1 1 2 1 3
2 2 3 3 1
In this example, there are two individuals who have answered three questions. The answer to each question takes the following values {1,2,3}. So, in this examples, individual 1 answered 2 to Q1, 1 to Q2, and 3 for Q3. I now need to convert to a 'long' format that would be take the following format. For each individual and each possible answer
d2 <- data.frame('id'= rep(seq(1:2),each=9),
'question' = rep(seq(1:3), each=3),
'option' = rep(seq(1:3)),
'choice' = 0)
d2
id question option choice
1 1 1 1 0
2 1 1 2 0
3 1 1 3 0
4 1 2 1 0
5 1 2 2 0
6 1 2 3 0
7 1 3 1 0
8 1 3 2 0
9 1 3 3 0
10 2 1 1 0
11 2 1 2 0
12 2 1 3 0
13 2 2 1 0
14 2 2 2 0
15 2 2 3 0
16 2 3 1 0
17 2 3 2 0
18 2 3 3 0
The part of I am struggling with is how to 'merge' or 'reshape' the data from d1 into d2 so that the final outcome would look like the following with the choice column reflecting the answers given in dataframe d1:
id question option choice
1 1 1 1 0
2 1 1 2 1
3 1 1 3 0
4 1 2 1 1
5 1 2 2 0
6 1 2 3 0
7 1 3 1 0
8 1 3 2 0
9 1 3 3 1
10 2 1 1 0
11 2 1 2 0
12 2 1 3 1
13 2 2 1 0
14 2 2 2 0
15 2 2 3 1
16 2 3 1 1
17 2 3 2 0
18 2 3 3 0
Individual 1 did not chose option 1 or 3 in question 1, but DID choose option 2 as indicated in the dummy coding in the choice column.
Any thoughts on this would be greatly appreciated.
Upvotes: 0
Views: 91
Reputation: 39154
d3
is the final output.
d1 <- data.frame('id' = c(1,2),
'Q1' = c(2,3),
'Q2' = c(1,3),
'Q3' = c(3,1))
library(dplyr)
library(tidyr)
d2 <- d1 %>%
gather(question, option, -id)
d3 <- d2 %>%
complete(id, question, option) %>%
left_join(d2, by = c("id", "question")) %>%
mutate(question = sub("Q", "", question)) %>%
mutate(option.y = ifelse(option.y == option.x, 1, 0)) %>%
rename(option = option.x, choice = option.y)
Here is a more concise approach. dt2
is the final output.
d2 <- d1 %>%
gather(question, option, -id) %>%
mutate(choice = 1) %>%
complete(id, question, option, fill = list("choice" = 0)) %>%
mutate(question = sub("Q", "", question))
Upvotes: 2