Michael Lenard
Michael Lenard

Reputation: 1

How do I pivot only specific columns from wide to long in R tidyverse?

I'm having a hell of a time figuring out how to do a specific pivot I need for some data processing I need to do. I have data that I needed to pivot wider for joining to avoid the multiplicative join issue, but four of the columns I have need to be pivoted back to long for the final dataset. The actual data is pretty unwieldy, so I'll start with a toy example that I think gets at the same issue:

Data I have now:

A, B, C, D, E, F, G, H, I, J, K, L
w1, w2, w3, w4, l1, l2, l3, l4, l5, l6, l7, l8
w5, w6, w7, w8, l9, l10, l11, l12, l13, l14, l15, l16

Format I need it in:

A, B, C, D, M, N, O, P
w1, w2, w3, w4, l1, l2, l3, l4
w1, w2, w3, w4, l5, l6, l7, l8
w5, w6, w7, w8, l9, l10, l11, l12
w5, w6, w7, w8, l13, l14, l15, l16

Basically, I have a set of data where a large subset of columns need to be lengthened (or 'stacked') every 4th column. One column needs l1, l5, l9, l13, l(4n+1), the next needs l2, l6, l10, l14, l(4n+2) etc. I don't mind rearranging the columns if that makes the pivot easier, but I have no idea how to make R do this for me. The documentation on pivot_longer and pivot_longer_spec assumes data that is...a bit nicer than what I have to work with, and their examples aren't helpful for this task. They also seem to assume that important data is contained in the column names, which for this data I have is simply not the case - I only need the data in the cells in a particular configuration.

The actual wide dataset looks like this: https://i.sstatic.net/hRhBw.png, so what I need it to look like is

[wide columns], T1.y, data.consensus_text_T2, data.consensus_text_T3, data.consensus_text_T4,
[wide columns], T7, data.consensus_text_T8.y, data.consensus_text_T9.y, data.consensus_text_T10.y,
[wide columns], T13, data.consensus_text_T14, data.consensus_text_T15, data.consensus_text_T16

and so on until it repeats back at T1.y with new values in the wide columns after 14 rows.

Thanks for any help!

Upvotes: 0

Views: 1973

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269624

1) pivot Assuming dd defined reproducibly in the Note at the end convert to long form, create a name column for the new names and an i column defined using gl. The arguments of gl are the number of rows and columns into which each input row should map (not counting the id columns) and the number of rows in the long form data frame. It equals c(1,1,1,1,2,2,2,2) repeated to the number of rows in the long form data frame. At the end convert back to wide form.

library(dplyr)
library(tidyr)

dd %>%
  pivot_longer(-(A:D)) %>%
  mutate(name = rep(c("M", "N", "O", "P"), length = n()), i = gl(2, 4, n())) %>%
  pivot_wider(c(A:D, i)) %>%
  select(-i)

giving:

# A tibble: 4 x 8
  A     B     C     D     M     N     O     P    
  <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 w1    w2    w3    w4    l1    l2    l3    l4   
2 w1    w2    w3    w4    l5    l6    l7    l8   
3 w5    w6    w7    w8    l9    l10   l11   l12  
4 w5    w6    w7    w8    l13   l14   l15   l16  

1a) or more generally:

nid <- 4  # first nid columns are id columns
newnames <- c("M", "N", "O", "P")

k <- length(newnames)
nc <- ncol(dd)
ids <- names(dd)[1:nid]
dd %>%
  pivot_longer(-(1:nid)) %>%
  mutate(name = rep(newnames, length = n()), 
         i = gl((nc-nid)/k, k, n())) %>%
  pivot_wider(all_of(c(ids, "i"))) %>%
  select(-i)

2) group_modify Another approach is to go row by row and construct each pair of output rows explicitly.

dd %>%
 group_by(across(A:D)) %>%
 group_modify(~ with(., tibble(M=c(E,I), N=c(F,J), O=c(G,L), P=c(H,L)))) %>%
 ungroup

2a) or more generally

newnames <- c("M", "N", "O", "P")
dd %>%
 group_by(across(A:D)) %>%
 group_modify(~ matrix(unlist(.), ncol = length(newnames), byrow = TRUE) %>%
   as.data.frame %>%
   setNames(newnames)
 ) %>%
 ungroup

3) A third approach is to define the two halves and then interleave them via a left join.

id <- 1:4
i1 <- 5:8    # non-id columns that go in 1st row of pair
i2 <- 9:12   # non-id columns that go in 2nd row of pair

d1 <- dd[-i2]
d2 <- dd[-i1]
names(d1)[-id] <- names(d2)[-id] <- c("M", "N", "O", "P")
left_join(dd[id], bind_rows(d1, d2))

3a) or more generally:

nid <- 4  # no of id columns
newnames <- c("M", "N", "O", "P")

nc <- ncol(dd)
k <- length(newnames)
s <- split.default(dd, c(rep(0, nid), gl((nc - nid) / k, k)))
L <- lapply(s[-1], setNames, newnames)
r <- bind_rows(lapply(L, function(x) bind_cols(s[[1]], x)))
left_join(s[[1]], r)

Note

dd <- 
structure(list(A = c("w1", "w5"), B = c("w2", "w6"), C = c("w3", 
"w7"), D = c("w4", "w8"), E = c("l1", "l9"), F = c("l2", "l10"
), G = c("l3", "l11"), H = c("l4", "l12"), I = c("l5", "l13"), 
    J = c("l6", "l14"), K = c("l7", "l15"), L = c("l8", "l16"
    )), class = "data.frame", row.names = c(NA, -2L))

Upvotes: 4

Related Questions