mpap
mpap

Reputation: 123

Wide to long format with several variables

This question is related to a previous question I asked on converting from wide to long format in R with an additional complication.

previous question is here: Wide to long data conversion

The wide data I start with looks like the following:

d2 <- data.frame('id' = c(1,2),
             'Q1' = c(2,3),
             'Q2' = c(1,3),
             'Q3' = c(3,1),
             'Q1_X_Opt_1' = c(0,0),
             'Q1_X_Opt_2' = c(75,200),
             'Q1_X_Opt_3' = c(150,300),
             'Q2_X_Opt_1' = c(0,0),
             'Q2_X_Opt_2' = c(150,200),
             'Q2_X_Opt_3' = c(75,300),
             'Q3_X_Opt_1' = c(0,0),
             'Q3_X_Opt_2' = c(100,500),
             'Q3_X_Opt_3' = c(150,300))

In this example, there are two individuals who have answered three questions. The answer to each question takes the following values {1,2,3} encoded in Q1, Q2, and Q3. So, in this examples, individual 1 chose option 2 in Q1, chose option 1 in Q2, and chose option 3 in Q3.

For each option there is also a variable X associated with each option that I also need to be converted to wide format. The output I am seeking looks like the following:

    id question option choice cost
1   1        1      1      0    0
2   1        1      2      1   75
3   1        1      3      0  150
4   1        2      1      1    0
5   1        2      2      0  150
6   1        2      3      0   75
7   1        3      1      0    0
8   1        3      2      0  100
9   1        3      3      1  150
10  2        1      1      0    0
11  2        1      2      0  200
12  2        1      3      1  300
13  2        2      1      0    0
14  2        2      2      0  200
15  2        2      3      1  300
16  2        3      1      1    0
17  2        3      2      0  500
18  2        3      3      0  300

I have tried to adapting the code from the answer to the prior question, but with no success thus far. Thanks for any suggestions or comments.

Upvotes: 0

Views: 445

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 270268

1) First melt the input transformihg it to long form. Then break apart the variable column on underscore using read.table giving columns named V1, V2, V3, V4 representing the question as a factor, junk, junk and the option parts, respectively. Append that back to m and set the question to the factor level of V1 and option to V4. Sort it by id to give the same ordering as in the question. (If the order does not matter this line could be omiited.)

Now put the parts together noting that choice is 1 if the appropriate column among the Q1/Q2/Q3 columns equals the option and 0 otherwise.

library(reshape2)

m <- melt(d2, id = 1:4)
m <- cbind(m, read.table(text = as.character(m$variable), sep = "_"))
m <- transform(m, question = as.numeric(V1), option = V4)
m <- m[order(m$id), ]
n <- nrow(m)
with(m, data.frame(id, 
   question, 
   option,
   choice = (m[cbind(1:n, question + 1)] == option) + 0, 
   value))

The result is:

   id question option choice value
1   1        1      1      0     0
2   1        1      2      1    75
3   1        1      3      0   150
4   1        2      1      1     0
5   1        2      2      0   150
6   1        2      3      0    75
7   1        3      1      0     0
8   1        3      2      0   100
9   1        3      3      1   150
10  2        1      1      0     0
11  2        1      2      0   200
12  2        1      3      1   300
13  2        2      1      0     0
14  2        2      2      0   200
15  2        2      3      1   300
16  2        3      1      1     0
17  2        3      2      0   500
18  2        3      3      0   300

2) This could also be expressed using magirttr giving the same answer. Note that the last two pipes use the exposition operator %$% providing an implicit with(., ...) around the subsequent expression:

library(magrittr)
library(reshape2)

d2 %>%
   melt(id = 1:4) %>%
   cbind(read.table(text = as.character(.$variable), sep = "_")) %>%
   transform(question = as.numeric(V1), option = V4) %$%
   .[order(id), ] %$%
   data.frame(id, 
              question, 
              option, 
              choice = (.[cbind(1:nrow(.), question + 1)] == option) + 0, 
              value)

3) This can be translated to reshape2/dplyr/tidyr:

library(reshape2)
library(dplyr)
library(tidyr)

d2 %>%
   melt(id = 1:4) %>%
   separate(variable, c("question", "X", "Opt", "option")) %>%
   arrange(id) %>%
   mutate(question = as.numeric(factor(question)),
          choice = (.[cbind(1:n(), question + 1)] == option) + 0) %>%
   select(id, question, option, choice, value)

Upvotes: 1

alistaire
alistaire

Reputation: 43354

It's not exactly elegant, but here's a tidyverse version:

library(tidyverse)

d3 <- d2 %>% 
    gather(option, cost, -id:-Q3) %>% 
    gather(question, choice, Q1:Q3) %>% 
    separate(option, c('question2', 'option'), extra = 'merge') %>% 
    filter(question == question2) %>% 
    mutate_at(vars(question, option), parse_number) %>% 
    mutate(choice = as.integer(option == choice)) %>% 
    select(1, 5, 3, 6, 4) %>% 
    arrange(id)

d3
#>    id question option choice cost
#> 1   1        1      1      0    0
#> 2   1        1      2      1   75
#> 3   1        1      3      0  150
#> 4   1        2      1      1    0
#> 5   1        2      2      0  150
#> 6   1        2      3      0   75
#> 7   1        3      1      0    0
#> 8   1        3      2      0  100
#> 9   1        3      3      1  150
#> 10  2        1      1      0    0
#> 11  2        1      2      0  200
#> 12  2        1      3      1  300
#> 13  2        2      1      0    0
#> 14  2        2      2      0  200
#> 15  2        2      3      1  300
#> 16  2        3      1      1    0
#> 17  2        3      2      0  500
#> 18  2        3      3      0  300

Upvotes: 2

Related Questions