How to convert this wide data frame to this long data frame?

Question

How to convert this wide data frame

# A tibble: 2 x 7
  name  question_1  question_1_response question_2    question_2_response question_3 question_3_response
                                                                     
1 ken   PC1,PC2,PC4                 4.5 PC3,MK1,MK2                   3.5 SBP1,SBP5                    5
2 hello PC1,PC5                     4   MK1,SBP1,SBP2                 4   NA                          NA

To this?

# A tibble: 13 x 3
   name  subcomp value
       
 1 ken   PC1       4.5
 2 ken   PC2       4.5
 3 ken   PC4       4.5
 4 ken   PC3       3.5
 5 ken   MK1       3.5
 6 ken   MK2       3.5
 7 ken   SBP1      5  
 8 ken   SBP5      5  
 9 hello PC1       4  
10 hello PC5       4  
11 hello MK1       4  
12 hello SBP1      4  
13 hello SBP2      4

Sample data:

library(tidyverse)
test <- tribble(
  ~name, ~question_1, ~question_1_response, ~question_2, ~question_2_response, ~question_3, ~question_3_response,
  "ken", "PC1,PC2,PC4", 4.5, "PC3,MK1,MK2", 3.5, "SBP1,SBP5", 5,
  "hello", "PC1,PC5", 4, "MK1,SBP1,SBP2", 4, NA, NA
)

I tried using gather/separate/spread but can't quite wrap my head around it. Thank you much!

akrun · Accepted Answer

We can rename the 'response' columns with str_replace (here we capture the digits (\d+) after the _ as a group, and word (\w+) that follows the _ at the end. In the replacement, specify the backreference (\1, \2) of the captured group in reverse order) and reshape it to 'long' format with pivot_longer.

library(dplyr)
library(tidyr)
library(stringr)
test %>%
    rename_at(vars(ends_with('response')),
        ~ str_replace(., '_(\d+)_(\w+)', '\2_\1')) %>% 
    pivot_longer(cols = -name, names_to = c('.value', 'group'), 
         names_sep="_", values_drop_na = TRUE) %>% 
    separate_rows(question)%>% 
    select(name, subcomp = question, value = questionresponse)
# A tibble: 13 x 3
#   name  subcomp value
#       
# 1 ken   PC1       4.5
# 2 ken   PC2       4.5
# 3 ken   PC4       4.5
# 4 ken   PC3       3.5
# 5 ken   MK1       3.5
# 6 ken   MK2       3.5
# 7 ken   SBP1      5  
# 8 ken   SBP5      5  
# 9 hello PC1       4  
#10 hello PC5       4  
#11 hello MK1       4  
#12 hello SBP1      4  
#13 hello SBP2      4

Or using data.table

library(data.table)
library(splitstackshape)
melt(setDT(test), measure = patterns("\d+$", "response$"), 
    value.name = c("subcomp", 'value'), na.rm = TRUE)[, 
       cSplit(.SD, "subcomp", ",", "long")][, variable := NULL][]
#    name subcomp value
# 1:   ken     PC1   4.5
# 2:   ken     PC2   4.5
# 3:   ken     PC4   4.5
# 4: hello     PC1   4.0
# 5: hello     PC5   4.0
# 6:   ken     PC3   3.5
# 7:   ken     MK1   3.5
# 8:   ken     MK2   3.5
# 9: hello     MK1   4.0
#10: hello    SBP1   4.0
#11: hello    SBP2   4.0
#12:   ken    SBP1   5.0
#13:   ken    SBP5   5.0

How to convert this wide data frame to this long data frame?

Answers (2)

Related Questions