Micky
Micky

Reputation: 300

Combining elements of one column into two columns by group in R

Given a two column data.frame with one containing group labels and a second containing integer values ordered from smallest to largest. How can the data be expanded creating pairs of combinations of the integer column?

Not sure the best way to state this. I'm not interested in all possible combinations but instead all unique combinations starting from the lowest value.

In r, the combn function gives the desired output not considering groups, for example:

t(combn(seq(1:4),2))
     [,1] [,2]
[1,]    1    2
[2,]    1    3
[3,]    1    4
[4,]    2    3
[5,]    2    4
[6,]    3    4

Since the first values is 1 we get the unique combination of (1,2) and not the additional combination of (2,1) which I don't need. How would one then apply a similar method by groups?

for example given a data.frame

test <- data.frame(Group = rep(c("A","B"),each=4),
                   Val = c(1,3,6,8,2,4,5,7))
test
  Group Val
1     A   1
2     A   3
3     A   6
4     A   8
5     B   2
6     B   4
7     B   5
8     B   7

I was able to come up with this solution that gives the desired output:

test <- data.frame(Group = rep(c("A","B"),each=4),
                   Val = c(1,3,6,8,2,4,5,7))
j=1
for(i in unique(test$Group)){
  if(j==1){
    one <- filter(test,i == Group)
    two <- data.frame(t(combn(one$Val,2)))
    test1 <- data.frame(Group = i,Val1=two$X1,Val2=two$X2)
    j=j+1
  }else{
    one <- filter(test,i == Group)
    two <- data.frame(t(combn(one$Val,2)))
    test2 <- data.frame(Group = i,Val1=two$X1,Val2=two$X2)
    test1 <- rbind(test1,test2)
  }
}

 test1
   Group Val1 Val2
1      A    1    3
2      A    1    6
3      A    1    8
4      A    3    6
5      A    3    8
6      A    6    8
7      B    2    4
8      B    2    5
9      B    2    7
10     B    4    5
11     B    4    7
12     B    5    7

However, this is not elegant and is really slow as the number of groups and length of each group become large. It seems like there should be a more elegant and efficient solution but so far I have not come across anything on SO.

I would appreciate any ideas!

Upvotes: 1

Views: 672

Answers (3)

Jakub.Novotny
Jakub.Novotny

Reputation: 3047

library(tidyverse)

df2 <- split(df$Val, df$Group) %>%
  map(~gtools::combinations(n = 4, r = 2, v = .x)) %>%
  map(~as_tibble(.x, .name_repair = "unique")) %>%
  bind_rows(.id = "Group")

Upvotes: 0

Darren Tsai
Darren Tsai

Reputation: 35554

You can set simplify = F in combn() and then use unnest_wider() in dplyr.

library(dplyr)
library(tidyr)

test %>%
  group_by(Group) %>% 
  summarise(Val = combn(Val, 2, simplify = F)) %>% 
  unnest_wider(Val, names_sep = "_")

#    Group Val_1 Val_2
#    <chr> <dbl> <dbl>
#  1 A         1     3
#  2 A         1     6
#  3 A         1     8
#  4 A         3     6
#  5 A         3     8
#  6 A         6     8
#  7 B         2     4
#  8 B         2     5
#  9 B         2     7
# 10 B         4     5
# 11 B         4     7
# 12 B         5     7

Upvotes: 1

Wimpel
Wimpel

Reputation: 27732

here is a data.table approach

library( data.table )
#make test a data.table
setDT(test)
#split by group
L <- split( test, by = "Group")
#get unique combinations of 2 Vals 
L2 <- lapply( L, function(x) {
  as.data.table( t( combn( x$Val, m = 2, simplify = TRUE ) ) )
})
#merge them back together
data.table::rbindlist( L2, idcol = "Group" )

#    Group V1 V2
# 1:     A  1  3
# 2:     A  1  6
# 3:     A  1  8
# 4:     A  3  6
# 5:     A  3  8
# 6:     A  6  8
# 7:     B  2  4
# 8:     B  2  5
# 9:     B  2  7
#10:     B  4  5
#11:     B  4  7
#12:     B  5  7

Upvotes: 2

Related Questions