Reputation: 13319
Using tidyr
>= 1.0.0, one can use tidy
selection in the cols
argument as follows:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols=starts_with("DL_TM"),
names_to = "TM",values_to = "DM_TM") %>%
pivot_longer(cols=starts_with("DL_CD"),
names_to = "CD",values_to = "DL_CD") %>%
na.omit() %>%
select(-TM,-CD)
However, the above will quickly get cumbersome(repetitive) with many columns, how can one reduce this to single pivot
ing?! I have imagined something conceptual like
pivot_longer(cols=starts_with("DL_TM | DL_CD")....)
which will obviously not work because tidy
selection only works for a single pattern(as far as I know).
Data
df <- structure(list(DL_TM1 = c(16L, 18L, 53L, 59L, 29L, 3L), DL_CD1 = c("AK",
"RB", "RA", "AJ", "RA", "RS"), DL_TM2 = c(5L, 4L, 8L, NA, 1L,
NA), DL_CD2 = c("CN", "AJ", "RB", NA, "AJ", NA), DL_TM3 = c(NA,
NA, 2L, NA, NA, NA), DL_CD3 = c(NA, NA, "AJ", NA, NA, NA), DL_TM4 = c(NA,
NA, NA, NA, NA, NA), DL_CD4 = c(NA, NA, NA, NA, NA, NA), DL_TM5 = c(NA,
NA, NA, NA, NA, NA), DL_CD5 = c(NA, NA, NA, NA, NA, NA), DEP_DELAY_TM = c(21L,
22L, 63L, 59L, 30L, 3L)), class = "data.frame", row.names = c(NA,
-6L))
Expected Output:
Same as the above but with single pivot
ing.
Upvotes: 3
Views: 110
Reputation: 269624
Based on the response to the comment that this was moved from the code in the question does not actually produce the desired result and what was wanted was the result that this produces:
df %>%
pivot_longer(-DEP_DELAY_TM, names_to = c(".value", "X"),
names_pattern = "(\\D+)(\\d)") %>%
select(-X) %>%
drop_na
giving:
# A tibble: 11 x 3
DEP_DELAY_TM DL_TM DL_CD
<int> <int> <chr>
1 21 16 AK
2 21 5 CN
3 22 18 RB
4 22 4 AJ
5 63 53 RA
6 63 8 RB
7 63 2 AJ
8 59 59 AJ
9 30 29 RA
10 30 1 AJ
11 3 3 RS
We can alternately do this using base R's reshape
. First split the column names (except the last column) by the non-digit parts giving the varying
list and then reshape df
to long form using that and finally run na.omit
to remove the rows with NAs.
nms1 <- head(names(df), -1)
varying <- split(nms1, gsub("\\d", "", nms1))
na.omit(reshape(df, dir = "long", varying = varying, v.names = names(varying)))
giving:
DEP_DELAY_TM time DL_CD DL_TM id
1.1 21 1 AK 16 1
2.1 22 1 RB 18 2
3.1 63 1 RA 53 3
4.1 59 1 AJ 59 4
5.1 30 1 RA 29 5
6.1 3 1 RS 3 6
1.2 21 2 CN 5 1
2.2 22 2 AJ 4 2
3.2 63 2 RB 8 3
5.2 30 2 AJ 1 5
3.3 63 3 AJ 2 3
Upvotes: 4
Reputation: 93811
We can extract the column groupings ("TM" and "CD" in this case), map
over each column group to apply pivot_longer
to that group, and then full_join
the resulting list elements. Let me know if this covers your real-world use case.
suffixes = unique(gsub(".*_(.{2})[0-9]*", "\\1", names(df)))
df.long = suffixes %>%
map(~ df %>%
mutate(id=1:n()) %>% # Ensure unique identification of each original data row
select(id, DEP_DELAY_TM, starts_with(paste0("DL_",.x))) %>%
pivot_longer(cols=-c(DEP_DELAY_TM, id),
names_to=.x,
values_to=paste0(.x,"_value")) %>%
na.omit() %>%
select(-matches(paste0("^",.x,"$")))
) %>%
reduce(full_join) %>%
select(-id)
DEP_DELAY_TM TM_value CD_value 1 21 16 AK 2 21 16 CN 3 21 5 AK 4 21 5 CN 5 22 18 RB 6 22 18 AJ 7 22 4 RB 8 22 4 AJ 9 63 53 RA 10 63 53 RB 11 63 53 AJ 12 63 8 RA 13 63 8 RB 14 63 8 AJ 15 63 2 RA 16 63 2 RB 17 63 2 AJ 18 59 59 AJ 19 30 29 RA 20 30 29 AJ 21 30 1 RA 22 30 1 AJ 23 3 3 RS
Upvotes: 3