Paul
Paul

Reputation: 145

How to pivot multiple datframe columns as a group in r

I've read many posts on pivoting/stacking/reshaping dataframes, but none have quite gotten me to a solution for my problem. Consider the following df:

QID <- c('Q1', 'Q2', 'Q3')
Qtype <- c('Matrix', 'TE', 'DB')
Qtext <- c('A', 'B', 'C')
subQuestion.1.recode <- c('1', NA, NA)
subQuestion.1.description <- c('foo', NA, NA)
subQuestion.1.choice <- c('baa', NA, NA)
subQuestion.2.recode <- c('2', NA, NA)
subQuestion.2.description <- c('foo2', NA, NA)
subQuestion.2.choice <- c('baa2', NA, NA)
subQuestion.3.recode <- c('3', NA, NA)
subQuestion.3.description <- c('foo3', NA, NA)
subQuestion.3.choice <- c('baa3', NA, NA)

df <- data.frame(QID, Qtype, Qtext, subQuestion.1.recode, subQuestion.1.description, subQuestion.1.choice, subQuestion.2.recode, 
                 subQuestion.2.description, subQuestion.2.choice, subQuestion.3.recode, subQuestion.3.description, subQuestion.3.choice)

I want to pivot the 'subQuestion' columns from wide to long. I also want to keep the 'subQuestion' questions together as rows by the number in the column name (which is the same number as in the 'subQuestion.recode' columns). So the output would look like this:

QID <- c('Q1', 'Q1', 'Q1', 'Q2', 'Q3')
Qtype <- c('Matrix', 'Matrix', 'Matrix', 'TE', 'DB')
Qtext <- c('A', 'A', 'A', 'B', 'C')
subQuestion.recode <- c('1', '2', '3', NA, NA)
subQuestion.description <- c('foo', 'foo2', 'foo3', NA, NA)
subQuestion.choice<- c('baa', 'baa2', 'baa3', NA, NA)

df_out <- data.frame(QID, Qtype, Qtext, subQuestion.recode, subQuestion.description, subQuestion.choice)

Thanks in advance for any help!

Upvotes: 0

Views: 53

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269644

This is a base solution.

Form subq as a character vector of the names of the columns whose names include sub. Split that vector into a list whose three components are the names of the choice columns, the names of the description columns and the names of the recode columns giving varying.

We then reshape to long form using varying.

Keep only the columns whose names contain Q or sub and remove duplicate rows giving u.

Finally sort by the rownames. This step, i.e. the last two lines of code below, can be omitted if the order does not matter.

subq <- grep("sub", names(df), value = TRUE)
varying <- split(subq, sub(".*\\.", "subQuestion.", subq))
long <- reshape(df, dir = "long", varying = varying, v.names = names(varying))

u <- unique(long[grep("Q|sub", names(long))])
o <- order(rownames(u))
out <- u[o, ]

giving:

> out
    QID  Qtype Qtext subQuestion.choice subQuestion.description subQuestion.recode
1.1  Q1 Matrix     A                baa                     foo                  1
1.2  Q1 Matrix     A               baa2                    foo2                  2
1.3  Q1 Matrix     A               baa3                    foo3                  3
2.1  Q2     TE     B               <NA>                    <NA>               <NA>
3.1  Q3     DB     C               <NA>                    <NA>               <NA>

Upvotes: 1

Duck
Duck

Reputation: 39595

Try this. The key in this code is to separate the variable of interest name after reshaping to long your data with pivot_longer(). Also, and id is necessary to track the rows and can be obtained with the number inside your variables. With that done, you only have to clean the values and reshape to wide keeping as id variables those one you meantioned in the question. Here the code with tidyverse approach:

library(tidyverse)
#Code
df %>% pivot_longer(cols = -c(QID,Qtype)) %>%
  separate(name,c('V1','V2','V3'),sep = '\\.') %>%
  group_by(QID,Qtype) %>%
  fill(V2,.direction = 'up') %>%
  filter(!is.na(value)) %>%
  mutate(Var=ifelse(is.na(V3),V1,paste0(V1,'.',V3))) %>%
  select(-V3) %>% ungroup() %>%
  select(-V1) %>%
  pivot_wider(names_from = Var,values_from=value) %>%
  select(-V2) %>% fill(Qtext)

Output:

# A tibble: 5 x 6
  QID   Qtype  Qtext subQuestion.recode subQuestion.description subQuestion.choice
  <chr> <chr>  <chr> <chr>              <chr>                   <chr>             
1 Q1    Matrix A     1                  foo                     baa               
2 Q1    Matrix A     2                  foo2                    baa2              
3 Q1    Matrix A     3                  foo3                    baa3              
4 Q2    TE     B     NA                 NA                      NA                
5 Q3    DB     C     NA                 NA                      NA    

Some data used:

#Data
df <- structure(list(QID = c("Q1", "Q2", "Q3"), Qtype = c("Matrix", 
"TE", "DB"), Qtext = c("A", "B", "C"), subQuestion.1.recode = c("1", 
NA, NA), subQuestion.1.description = c("foo", NA, NA), subQuestion.1.choice = c("baa", 
NA, NA), subQuestion.2.recode = c("2", NA, NA), subQuestion.2.description = c("foo2", 
NA, NA), subQuestion.2.choice = c("baa2", NA, NA), subQuestion.3.recode = c("3", 
NA, NA), subQuestion.3.description = c("foo3", NA, NA), subQuestion.3.choice = c("baa3", 
NA, NA)), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 1

Related Questions