Carrol
Carrol

Reputation: 1285

Wide to long, combining columns in pairs but keeping ID column - R

I have a dataframe of the following type

ID case1 case2 case3 case4
1  A     B     C     D
2  B     A
3  E     F
4  G     C     A
5  T

I need to change its format, to a long shape, similar as the below:

ID col1 col2
1  A    B
1  A    C
1  A    D
1  B    C
1  B    D
1  C    D
2  B    A
3  E    F
4  G    C
4  G    A
4  C    A
5  T

As you can see, I need to maintain the ID and ignore empty columns. There are some cases like T that need to remain in the dataset, but without a col2.

I am honestly not sure how to approach this, so that is why there are no examples of what I have tried.

Upvotes: 0

Views: 38

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388797

You can get the data in long format and create all combination of values for each ID if the number of rows is greater than 1 in that ID.

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = -ID, values_drop_na = TRUE) %>%
  group_by(ID) %>%
  summarise(value = if(n() > 1) list(setNames(as.data.frame(t(combn(value, 2))), 
                               c('col1', 'col2'))) 
                 else list(data.frame(col1 = value[1], col2 = NA_character_))) %>%
  unnest(value)


# A tibble: 12 x 3
#      ID col1  col2 
#   <int> <chr> <chr>
# 1     1 A     B    
# 2     1 A     C    
# 3     1 A     D    
# 4     1 B     C    
# 5     1 B     D    
# 6     1 C     D    
# 7     2 B     A    
# 8     3 E     F    
# 9     4 G     C    
#10     4 G     A    
#11     4 C     A    
#12     5 T     NA   

data

df <- structure(list(ID = 1:5, case1 = c("A", "B", "E", "G", "T"), 
case2 = c("B", "A", "F", "C", NA), case3 = c("C", NA, NA, 
"A", NA), case4 = c("D", NA, NA, NA, NA)), 
class = "data.frame", row.names = c(NA, -5L))

Upvotes: 2

Related Questions