Reputation: 145
I've read many posts on pivoting/stacking/reshaping dataframes, but none have quite gotten me to a solution for my problem. Consider the following df:
QID <- c('Q1', 'Q2', 'Q3')
Qtype <- c('Matrix', 'TE', 'DB')
Qtext <- c('A', 'B', 'C')
subQuestion.1.recode <- c('1', NA, NA)
subQuestion.1.description <- c('foo', NA, NA)
subQuestion.1.choice <- c('baa', NA, NA)
subQuestion.2.recode <- c('2', NA, NA)
subQuestion.2.description <- c('foo2', NA, NA)
subQuestion.2.choice <- c('baa2', NA, NA)
subQuestion.3.recode <- c('3', NA, NA)
subQuestion.3.description <- c('foo3', NA, NA)
subQuestion.3.choice <- c('baa3', NA, NA)
df <- data.frame(QID, Qtype, Qtext, subQuestion.1.recode, subQuestion.1.description, subQuestion.1.choice, subQuestion.2.recode,
subQuestion.2.description, subQuestion.2.choice, subQuestion.3.recode, subQuestion.3.description, subQuestion.3.choice)
I want to pivot the 'subQuestion' columns from wide to long. I also want to keep the 'subQuestion' questions together as rows by the number in the column name (which is the same number as in the 'subQuestion.recode' columns). So the output would look like this:
QID <- c('Q1', 'Q1', 'Q1', 'Q2', 'Q3')
Qtype <- c('Matrix', 'Matrix', 'Matrix', 'TE', 'DB')
Qtext <- c('A', 'A', 'A', 'B', 'C')
subQuestion.recode <- c('1', '2', '3', NA, NA)
subQuestion.description <- c('foo', 'foo2', 'foo3', NA, NA)
subQuestion.choice<- c('baa', 'baa2', 'baa3', NA, NA)
df_out <- data.frame(QID, Qtype, Qtext, subQuestion.recode, subQuestion.description, subQuestion.choice)
Thanks in advance for any help!
Upvotes: 0
Views: 53
Reputation: 269644
This is a base solution.
Form subq
as a character vector of the names of the columns whose names include sub
. Split that vector into a list whose three components are the names of the choice
columns, the names of the description
columns and the names of the recode
columns giving varying
.
We then reshape
to long
form using varying
.
Keep only the columns whose names contain Q
or sub
and remove duplicate rows giving u
.
Finally sort by the rownames. This step, i.e. the last two lines of code below, can be omitted if the order does not matter.
subq <- grep("sub", names(df), value = TRUE)
varying <- split(subq, sub(".*\\.", "subQuestion.", subq))
long <- reshape(df, dir = "long", varying = varying, v.names = names(varying))
u <- unique(long[grep("Q|sub", names(long))])
o <- order(rownames(u))
out <- u[o, ]
giving:
> out
QID Qtype Qtext subQuestion.choice subQuestion.description subQuestion.recode
1.1 Q1 Matrix A baa foo 1
1.2 Q1 Matrix A baa2 foo2 2
1.3 Q1 Matrix A baa3 foo3 3
2.1 Q2 TE B <NA> <NA> <NA>
3.1 Q3 DB C <NA> <NA> <NA>
Upvotes: 1
Reputation: 39595
Try this. The key in this code is to separate the variable of interest name
after reshaping to long your data with pivot_longer()
. Also, and id is necessary to track the rows and can be obtained with the number inside your variables. With that done, you only have to clean the values and reshape to wide keeping as id variables those one you meantioned in the question. Here the code with tidyverse
approach:
library(tidyverse)
#Code
df %>% pivot_longer(cols = -c(QID,Qtype)) %>%
separate(name,c('V1','V2','V3'),sep = '\\.') %>%
group_by(QID,Qtype) %>%
fill(V2,.direction = 'up') %>%
filter(!is.na(value)) %>%
mutate(Var=ifelse(is.na(V3),V1,paste0(V1,'.',V3))) %>%
select(-V3) %>% ungroup() %>%
select(-V1) %>%
pivot_wider(names_from = Var,values_from=value) %>%
select(-V2) %>% fill(Qtext)
Output:
# A tibble: 5 x 6
QID Qtype Qtext subQuestion.recode subQuestion.description subQuestion.choice
<chr> <chr> <chr> <chr> <chr> <chr>
1 Q1 Matrix A 1 foo baa
2 Q1 Matrix A 2 foo2 baa2
3 Q1 Matrix A 3 foo3 baa3
4 Q2 TE B NA NA NA
5 Q3 DB C NA NA NA
Some data used:
#Data
df <- structure(list(QID = c("Q1", "Q2", "Q3"), Qtype = c("Matrix",
"TE", "DB"), Qtext = c("A", "B", "C"), subQuestion.1.recode = c("1",
NA, NA), subQuestion.1.description = c("foo", NA, NA), subQuestion.1.choice = c("baa",
NA, NA), subQuestion.2.recode = c("2", NA, NA), subQuestion.2.description = c("foo2",
NA, NA), subQuestion.2.choice = c("baa2", NA, NA), subQuestion.3.recode = c("3",
NA, NA), subQuestion.3.description = c("foo3", NA, NA), subQuestion.3.choice = c("baa3",
NA, NA)), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 1