mpap
mpap

Reputation: 123

Wide to long data conversion

I have a dataset in 'wide' format that I would like to convert to a non-standard long format. At least, that is how I would characterize this problem.

The original dataset mimics the following:

d1 <- data.frame('id' = c(1,2),
                 'Q1' = c(2,3),
                 'Q2' = c(1,3),
                 'Q3' = c(3,1))
d1
   id Q1 Q2 Q3
 1  1  2  1  3
 2  2  3  3  1 

In this example, there are two individuals who have answered three questions. The answer to each question takes the following values {1,2,3}. So, in this examples, individual 1 answered 2 to Q1, 1 to Q2, and 3 for Q3. I now need to convert to a 'long' format that would be take the following format. For each individual and each possible answer

d2 <- data.frame('id'= rep(seq(1:2),each=9),
                 'question' = rep(seq(1:3), each=3),
                 'option' = rep(seq(1:3)),
                 'choice' = 0)

d2
   id  question option choice
1   1         1      1      0
2   1         1      2      0
3   1         1      3      0
4   1         2      1      0
5   1         2      2      0
6   1         2      3      0
7   1         3      1      0
8   1         3      2      0
9   1         3      3      0
10  2         1      1      0
11  2         1      2      0
12  2         1      3      0
13  2         2      1      0
14  2         2      2      0
15  2         2      3      0
16  2         3      1      0
17  2         3      2      0
18  2         3      3      0

The part of I am struggling with is how to 'merge' or 'reshape' the data from d1 into d2 so that the final outcome would look like the following with the choice column reflecting the answers given in dataframe d1:

   id question option choice
1   1         1      1      0
2   1         1      2      1
3   1         1      3      0
4   1         2      1      1
5   1         2      2      0
6   1         2      3      0
7   1         3      1      0
8   1         3      2      0
9   1         3      3      1
10  2         1      1      0
11  2         1      2      0
12  2         1      3      1
13  2         2      1      0
14  2         2      2      0
15  2         2      3      1
16  2         3      1      1
17  2         3      2      0
18  2         3      3      0

Individual 1 did not chose option 1 or 3 in question 1, but DID choose option 2 as indicated in the dummy coding in the choice column.

Any thoughts on this would be greatly appreciated.

Upvotes: 0

Views: 91

Answers (1)

www
www

Reputation: 39154

d3 is the final output.

d1 <- data.frame('id' = c(1,2),
                 'Q1' = c(2,3),
                 'Q2' = c(1,3),
                 'Q3' = c(3,1))

library(dplyr)
library(tidyr)

d2 <- d1 %>%
  gather(question, option, -id) 

d3 <- d2 %>%
  complete(id, question, option) %>%
  left_join(d2, by = c("id", "question")) %>%
  mutate(question = sub("Q", "", question)) %>%
  mutate(option.y = ifelse(option.y == option.x, 1, 0)) %>%
  rename(option = option.x, choice = option.y)

Update

Here is a more concise approach. dt2 is the final output.

d2 <- d1 %>%
  gather(question, option, -id) %>%
  mutate(choice = 1) %>%
  complete(id, question, option, fill = list("choice" = 0)) %>%
  mutate(question = sub("Q", "", question))

Upvotes: 2

Related Questions