Reputation: 1076
Here's a problem: I have all possible combinations of M elements from a set of N elements (N choose M). Each combination has a value assigned.
An example for N = 5 and M = 3:
library(tidyverse)
df <- letters[1:5] %>% combn( m = 3 ) %>% t() %>%
as_tibble( .name_repair = function(x) {paste0('id', 1:length(x))} )
df$val <- runif( nrow(df) )
Which gives a set of 10 combinations:
# A tibble: 10 x 4
id1 id2 id3 val
<chr> <chr> <chr> <dbl>
1 a b c 0.713
2 a b d 0.314
3 a b e 0.831
4 a c d 0.555
5 a c e 0.915
6 a d e 0.954
7 b c d 0.131
8 b c e 0.0583
9 b d e 0.533
10 c d e 0.857
Now I would like to add the combinations such that the results represents selection of M elements without replacement (N!/(N-M)!), but conserving the values for each set of M elements.
So, staying with the example, the result should contain 543=60 rows. For the example, I can do it in a "manual" permutation of the columns:
# add missing combinations
df_perm <- df %>% bind_rows(
# 1, 3, 2
df %>% mutate( tmp = id2, id2 = id3, id3 = tmp ) %>%
select( -tmp )
) %>% bind_rows(
# 2, 1, 3
df %>% mutate( tmp = id1, id1 = id2, id2 = tmp ) %>%
select( -tmp )
) %>% bind_rows(
# 2, 3, 1
df %>% mutate( tmp = id1, id1 = id2, id2 = id3, id3 = tmp ) %>%
select( -tmp )
) %>% bind_rows(
# 3, 1, 2
df %>% mutate( tmp = id2, id2 = id1, id1 = id3, id3 = tmp ) %>%
select( -tmp )
) %>% bind_rows(
# 3, 2, 1
df %>% mutate( tmp = id3, id3 = id1, id1 = tmp ) %>%
select( -tmp )
)
However, this becomes unfeasible quickly for M>3.
What would be a more elegant way to achieve the same result?
Upvotes: 1
Views: 743
Reputation: 486
As I read your question, it essentially seems that you have assigned a value to each possible combination of size M from a set of size N. You would then like to map the value for each combination to its permutations.
For example, if the combination a, b, d
has a value of 0.4, then you would like a, b, d
, a, d, b
, b, a, d
, b, d, a
, d, b, a
and d, a, b
to have a value of 0.4.
First, get all possible permutations of the vector 1:M
, where M
is the number of elements per combination as defined above:
M <- 3
perm_mat <- gtools::permutations(M, M)
Then permute the columns of the df
as per the above permutations:
perm_df <- purrr::map_df(1:nrow(perm_mat), function(i){
df_curr <- df[,c(perm_mat[i,], M+1)]
colnames(df_curr) <- colnames(df)
df_curr
})
This produces the following output (first twenty rows):
V1 V2 V3 val
<chr> <chr> <chr> <dbl>
1 a b c 0.0682
2 a b d 0.735
3 a b e 0.0336
4 a c d 0.965
5 a c e 0.889
6 a d e 0.796
7 b c d 0.792
8 b c e 0.508
9 b d e 0.606
10 c d e 0.623
11 a c b 0.0682
12 a d b 0.735
13 a e b 0.0336
14 a d c 0.965
15 a e c 0.889
16 a e d 0.796
17 b d c 0.792
18 b e c 0.508
19 b e d 0.606
20 c e d 0.623
Note that the numbers in the values
column are different to the original post simply because I used a different seed before running runif
.
Upvotes: 3