mrhd
mrhd

Reputation: 1076

tidyverse: all permutations of categories

Here's a problem: I have all possible combinations of M elements from a set of N elements (N choose M). Each combination has a value assigned.

An example for N = 5 and M = 3:

library(tidyverse)

df <- letters[1:5] %>% combn( m = 3 ) %>% t() %>% 
  as_tibble( .name_repair = function(x) {paste0('id', 1:length(x))} )
df$val <- runif( nrow(df) )

Which gives a set of 10 combinations:

# A tibble: 10 x 4
   id1   id2   id3      val
   <chr> <chr> <chr>  <dbl>
 1 a     b     c     0.713 
 2 a     b     d     0.314 
 3 a     b     e     0.831 
 4 a     c     d     0.555 
 5 a     c     e     0.915 
 6 a     d     e     0.954 
 7 b     c     d     0.131 
 8 b     c     e     0.0583
 9 b     d     e     0.533 
10 c     d     e     0.857 

Now I would like to add the combinations such that the results represents selection of M elements without replacement (N!/(N-M)!), but conserving the values for each set of M elements.

So, staying with the example, the result should contain 543=60 rows. For the example, I can do it in a "manual" permutation of the columns:

# add missing combinations
df_perm <- df %>% bind_rows(
  # 1, 3, 2
  df %>% mutate( tmp = id2, id2 = id3, id3 = tmp ) %>%
    select( -tmp )
) %>% bind_rows(
  # 2, 1, 3
  df %>% mutate( tmp = id1, id1 = id2, id2 = tmp ) %>%
    select( -tmp )
) %>% bind_rows(
  # 2, 3, 1
  df %>% mutate( tmp = id1, id1 = id2, id2 = id3, id3 = tmp ) %>%
    select( -tmp )
) %>% bind_rows(
  # 3, 1, 2
  df %>% mutate( tmp = id2, id2 = id1, id1 = id3, id3 = tmp ) %>%
    select( -tmp )
) %>% bind_rows(
  # 3, 2, 1
  df %>% mutate( tmp = id3, id3 = id1, id1 = tmp ) %>%
    select( -tmp )
)

However, this becomes unfeasible quickly for M>3.

What would be a more elegant way to achieve the same result?

Upvotes: 1

Views: 743

Answers (1)

M. Rodo
M. Rodo

Reputation: 486

As I read your question, it essentially seems that you have assigned a value to each possible combination of size M from a set of size N. You would then like to map the value for each combination to its permutations.

For example, if the combination a, b, d has a value of 0.4, then you would like a, b, d, a, d, b, b, a, d, b, d, a, d, b, a and d, a, b to have a value of 0.4.

First, get all possible permutations of the vector 1:M, where M is the number of elements per combination as defined above:

M <- 3
perm_mat <- gtools::permutations(M, M)

Then permute the columns of the df as per the above permutations:

perm_df <- purrr::map_df(1:nrow(perm_mat), function(i){
  df_curr <- df[,c(perm_mat[i,], M+1)]
  colnames(df_curr) <- colnames(df)
  df_curr
})

This produces the following output (first twenty rows):

   V1    V2    V3       val
   <chr> <chr> <chr>  <dbl>
 1 a     b     c     0.0682
 2 a     b     d     0.735 
 3 a     b     e     0.0336
 4 a     c     d     0.965 
 5 a     c     e     0.889 
 6 a     d     e     0.796 
 7 b     c     d     0.792 
 8 b     c     e     0.508 
 9 b     d     e     0.606 
10 c     d     e     0.623 
11 a     c     b     0.0682
12 a     d     b     0.735 
13 a     e     b     0.0336
14 a     d     c     0.965 
15 a     e     c     0.889 
16 a     e     d     0.796 
17 b     d     c     0.792 
18 b     e     c     0.508 
19 b     e     d     0.606 
20 c     e     d     0.623 

Note that the numbers in the values column are different to the original post simply because I used a different seed before running runif.

Upvotes: 3

Related Questions