NelsonGon
NelsonGon

Reputation: 13319

Reduce repeated pivoting to a single pivot

Using tidyr >= 1.0.0, one can use tidy selection in the cols argument as follows:

library(dplyr)
library(tidyr)
df %>% 
pivot_longer(cols=starts_with("DL_TM"),
               names_to = "TM",values_to = "DM_TM") %>% 
  pivot_longer(cols=starts_with("DL_CD"),
               names_to = "CD",values_to = "DL_CD") %>% 
  na.omit() %>% 
  select(-TM,-CD)

However, the above will quickly get cumbersome(repetitive) with many columns, how can one reduce this to single pivoting?! I have imagined something conceptual like pivot_longer(cols=starts_with("DL_TM | DL_CD")....) which will obviously not work because tidy selection only works for a single pattern(as far as I know).

Data

df <- structure(list(DL_TM1 = c(16L, 18L, 53L, 59L, 29L, 3L), DL_CD1 = c("AK", 
"RB", "RA", "AJ", "RA", "RS"), DL_TM2 = c(5L, 4L, 8L, NA, 1L, 
NA), DL_CD2 = c("CN", "AJ", "RB", NA, "AJ", NA), DL_TM3 = c(NA, 
NA, 2L, NA, NA, NA), DL_CD3 = c(NA, NA, "AJ", NA, NA, NA), DL_TM4 = c(NA, 
NA, NA, NA, NA, NA), DL_CD4 = c(NA, NA, NA, NA, NA, NA), DL_TM5 = c(NA, 
NA, NA, NA, NA, NA), DL_CD5 = c(NA, NA, NA, NA, NA, NA), DEP_DELAY_TM = c(21L, 
22L, 63L, 59L, 30L, 3L)), class = "data.frame", row.names = c(NA, 
-6L))

Expected Output: Same as the above but with single pivoting.

Upvotes: 3

Views: 110

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269624

Based on the response to the comment that this was moved from the code in the question does not actually produce the desired result and what was wanted was the result that this produces:

df %>%
  pivot_longer(-DEP_DELAY_TM, names_to = c(".value", "X"), 
    names_pattern = "(\\D+)(\\d)") %>%
  select(-X) %>%
  drop_na

giving:

# A tibble: 11 x 3
   DEP_DELAY_TM DL_TM DL_CD
          <int> <int> <chr>
 1           21    16 AK   
 2           21     5 CN   
 3           22    18 RB   
 4           22     4 AJ   
 5           63    53 RA   
 6           63     8 RB   
 7           63     2 AJ   
 8           59    59 AJ   
 9           30    29 RA   
10           30     1 AJ   
11            3     3 RS   

Base R

We can alternately do this using base R's reshape. First split the column names (except the last column) by the non-digit parts giving the varying list and then reshape df to long form using that and finally run na.omit to remove the rows with NAs.

nms1 <- head(names(df), -1)
varying <- split(nms1, gsub("\\d", "", nms1))
na.omit(reshape(df, dir = "long", varying = varying, v.names = names(varying)))

giving:

    DEP_DELAY_TM time DL_CD DL_TM id
1.1           21    1    AK    16  1
2.1           22    1    RB    18  2
3.1           63    1    RA    53  3
4.1           59    1    AJ    59  4
5.1           30    1    RA    29  5
6.1            3    1    RS     3  6
1.2           21    2    CN     5  1
2.2           22    2    AJ     4  2
3.2           63    2    RB     8  3
5.2           30    2    AJ     1  5
3.3           63    3    AJ     2  3

Upvotes: 4

eipi10
eipi10

Reputation: 93811

We can extract the column groupings ("TM" and "CD" in this case), map over each column group to apply pivot_longer to that group, and then full_join the resulting list elements. Let me know if this covers your real-world use case.

suffixes = unique(gsub(".*_(.{2})[0-9]*", "\\1", names(df)))

df.long = suffixes %>% 
  map(~ df %>% 
         mutate(id=1:n()) %>%  # Ensure unique identification of each original data row
         select(id, DEP_DELAY_TM, starts_with(paste0("DL_",.x))) %>%
         pivot_longer(cols=-c(DEP_DELAY_TM, id),
                      names_to=.x,
                      values_to=paste0(.x,"_value")) %>% 
         na.omit() %>% 
         select(-matches(paste0("^",.x,"$")))
  ) %>% 
  reduce(full_join) %>% 
  select(-id)
   DEP_DELAY_TM TM_value CD_value
1            21       16       AK
2            21       16       CN
3            21        5       AK
4            21        5       CN
5            22       18       RB
6            22       18       AJ
7            22        4       RB
8            22        4       AJ
9            63       53       RA
10           63       53       RB
11           63       53       AJ
12           63        8       RA
13           63        8       RB
14           63        8       AJ
15           63        2       RA
16           63        2       RB
17           63        2       AJ
18           59       59       AJ
19           30       29       RA
20           30       29       AJ
21           30        1       RA
22           30        1       AJ
23            3        3       RS

Upvotes: 3

Related Questions