KohLP
KohLP

Reputation: 35

Getting distinct Combinations in R with repetition

I have a list of integers, say: (1,2,3,4,5)

I want to obtain all the possible lists of size 5, such that:

1. The lists can contain repeat elements, e.g. (1,1,1,2,2)

2. Ordering does not matter, e.g. (1,1,2,2,1) is the same as (1,1,1,2,2)

How do I obtain this whole list? I am actually looking for combinations of size 10 from a set of 10 integers.

Upvotes: 2

Views: 456

Answers (3)

Marcus Campbell
Marcus Campbell

Reputation: 2796

The link provided by Gregor seems to rely entirely on third-party packages to produce multisets, so I wanted to give you a base R solution. Note that the packages mentioned in that link will almost certainly be far more efficient for extremely large datasets.

We can use expand.grid() to first generate all possible permutations with repetition of the elements in (1,2,3,4,5). In this case, different orderings are still considered to be distinct. We now want to remove these "extra" combinations that contain the same elements but have different orders, which we can do by using apply() and duplicated().

If you use the multiset calculator here, you'll find that the code below produces the correct number of combinations. Here's the code:

x <- seq(1:5)

df <- expand.grid(x, x, x, x, x) # generates 5^5 combinations, allowing repetition

index <- !duplicated(t(apply(df, 1, sort))) # find extraneous combinations
df <- df[index, ] # select only unique combinations

# check number of rows. It should be 126; one for each combination
nrows(df)

# Output
# [1] 126

# Quick look at part of the dataframe:

head(df)
  Var1 Var2 Var3 Var4 Var5
1    1    1    1    1    1
2    2    1    1    1    1
3    3    1    1    1    1
4    4    1    1    1    1
5    5    1    1    1    1
7    2    2    1    1    1

Upvotes: 2

Gregor Thomas
Gregor Thomas

Reputation: 146144

Using the RcppAlgos solution recommended in this answer, we want to choose sets of 5 elements from your input, with repetition, and order doesn't matter (thus comboGeneral(), we would use permuteGeneral() if order mattered). Being coded in C++ under the hood, this will be a very fast solution, and the profiling in the linked answer also found it to be memory efficient. Generating the sets for 10 multichoose 10 still took less than a second on my laptop.

library(RcppAlgos)
x = 1:5
result = comboGeneral(x, m = 5, repetition = T)
dim(result)
# [1] 126   5
head(result)
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    1    1    1    1    1
# [2,]    1    1    1    1    2
# [3,]    1    1    1    1    3
# [4,]    1    1    1    1    4
# [5,]    1    1    1    1    5
# [6,]    1    1    1    2    2

Upvotes: 2

Maurits Evers
Maurits Evers

Reputation: 50728

For a similar approach to @MarcusCampbell's within the tidyverse we can use expand to enumerate all possible combinations, and then only keep distinct combinations that are invariant under a permutation (i.e. where the ordering does not matter):

library(tidyverse);
tibble(V1 = 1:5, V2 = 1:5, V3 = 1:5, V4 = 1:5, V5 = 1:5) %>%
    expand(V1, V2, V3, V4, V5) %>%
    rowwise() %>%
    mutate(cmbn = paste(sort(c(V1, V2, V3, V4, V5)), collapse = ",")) %>%
    distinct(cmbn);
    ## A tibble: 126 x 1
    #   cmbn
    #   <chr>
    # 1 1,1,1,1,1
    # 2 1,1,1,1,2
    # 3 1,1,1,1,3
    # 4 1,1,1,1,4
    # 5 1,1,1,1,5
    # 6 1,1,1,2,2
    # 7 1,1,1,2,3
    # 8 1,1,1,2,4
    # 9 1,1,1,2,5
    #10 1,1,1,3,3
    ## ... with 116 more rows

Upvotes: 1

Related Questions