spore234
spore234

Reputation: 3650

all combinations of k numbers between 0 and n whose sum equals n, speed optimization

I have this R function to generate a matrix of all combinations of k numbers between 0 and n whose sum equals n. This is one of the bottlenecks of my program as it becomes extremely slow even with small numbers (because it is computing the power set)

Here is the code

sum.comb <-
function(n,k) {

 ls1 <- list()                           # generate empty list
 for(i in 1:k) {                        # how could this be done with apply?
    ls1[[i]] <- 0:n                      # fill with 0:n
 }
 allc <- as.matrix(expand.grid(ls1))     # generate all combinations, already using the built in function
 colnames(allc) <- NULL
 index <- (rowSums(allc) == n)       # make index with only the ones that sum to n
 allc[index, ,drop=F]                   # matrix with only the ones that sum to n
 }

Upvotes: 1

Views: 1312

Answers (5)

Thell
Thell

Reputation: 5958

See the partitions package partitcularly compositions() and blockparts() they'll be faster both as whole matrix generators and for iterative operation. Then if that still isn't fast enough see the wide variety of publications on composition and partition generation algorithms (loopless, gray code, and parallel) like Daniel Page's research.

library(partitions)
library(microbenchmark)

# rcpp_comps is an Rcpp implementation of compositions using loop
# free grey code, just for illustrative purposes.

# Just get the matrix
microbenchmark( compositions(3,10), compositions(10,3),
                blockparts(rep(10,3),10), blockparts(rep(3,10),3),
                rcpp_comps(10), times=10)
## Unit: microseconds
##                        expr    min     lq median   uq    max neval
##         compositions(3, 10) 1967.4 2050.9 2097.1 2173 3189.6    10
##         compositions(10, 3)  618.2  638.5  654.6  688  700.7    10
##  blockparts(rep(10, 3), 10)  612.2  620.8  645.6  663  963.5    10
##   blockparts(rep(3, 10), 3) 2057.2 2089.2 2176.0 2242 3116.4    10
##              rcpp_comps(10)  359.9  360.7  367.6  378  404.2    10

Upvotes: 2

user14382
user14382

Reputation: 969

What about something short like:

comb = function(n, k) {
    all = combn(0:n, k)
    sums = colSums(all)
    all[, sums == n]
}

Then something like:

comb(5, 3)

which produces a matrix as you requested:

     [,1] [,2]
[1,]    0    0
[2,]    1    2
[3,]    4    3

Thanks to @josilber and the original poster for pointing out that the OP required all the permutations with repetition rather than combinations. A similar approach for the permutations would look like:

perm = function(n, k) {
    grid = matrix(rep(0:n, k), n + 1, k)
    all = expand.grid(data.frame(grid))
    sums = rowSums(all)
    all[sums == n,]
}

Then something like:

perm(5, 3)

produces a matrix as you requested:

    X1 X2 X3
6    5  0  0
11   4  1  0
16   3  2  0
21   2  3  0
26   1  4  0
31   0  5  0
...

Upvotes: 1

flodel
flodel

Reputation: 89097

It is hard to tell if it will be useful unless you answer my question regarding your typical values for n and k (please do.) Here is a version using recursion that seems to be faster than josilber's using his benchmark test:

sum.comb3 <- function(n, k) {

   stopifnot(k > 0L)

   REC <- function(n, k) {
      if (k == 1L) list(n) else
      unlist(lapply(0:n, function(i)Map(c, i, REC(n - i, k - 1L))),
             recursive = FALSE)
   }

   matrix(unlist(REC(n, k)), ncol = k, byrow = TRUE)
}

microbenchmark(sum.comb(3, 10), sum.comb2(3, 10), sum.comb3(3, 10))
# Unit: milliseconds
#              expr      min       lq   median       uq      max neval
#  sum.comb2(3, 10) 39.55612 40.60798 41.91954 44.26756 70.44944   100
#  sum.comb3(3, 10) 25.86008 27.74415 28.37080 29.65567 34.18620   100

Upvotes: 4

josliber
josliber

Reputation: 44340

Here's a different approach, which incrementally expands the set from size 1 to k, at each iteration pruning the combinations whose sums exceed n. This should result in speedups where you have a large k relative to n, because you won't need to compute anything close to the size of the power set.

sum.comb2 <- function(n, k) {
  combos <- 0:n
  sums <- 0:n
  for (width in 2:k) {
    combos <- apply(expand.grid(combos, 0:n), 1, paste, collapse=" ")
    sums <- apply(expand.grid(sums, 0:n), 1, sum)
    if (width == k) {
      return(combos[sums == n])
    } else {
      combos <- combos[sums <= n]
      sums <- sums[sums <= n]
    }
  }
}

# Simple test
sum.comb2(3, 2)
# [1] "3 0" "2 1" "1 2" "0 3"

Here's an example of the speedups with small n and large k:

library(microbenchmark)
microbenchmark(sum.comb2(1, 100))
# Unit: milliseconds
#               expr      min      lq   median       uq      max neval
#  sum.comb2(1, 100) 149.0392 158.716 162.1919 174.0482 236.2095   100

This approach runs in under a second, while of course the approach with the power set would never get past the call to expand.grid, since you'll end up with 2^100 rows in your resulting matrix.

Even in a less extreme case, with n=3 and k=10, we see a 20x speedup compared to function in the original post:

microbenchmark(sum.comb(3, 10), sum.comb2(3, 10))
# Unit: milliseconds
#              expr       min        lq    median        uq       max neval
#   sum.comb(3, 10) 404.00895 439.94472 446.67452 461.24909 574.80426   100
#  sum.comb2(3, 10)  23.27445  24.53771  25.60409  26.97439  65.59576   100

Upvotes: 4

nfmcclure
nfmcclure

Reputation: 3141

The following can be done with lapply.

ls1 <- list()
for(i in 1:k) {
  ls1[[i]] <- 0:n
}

Try substituting this is and see if you get any speed up.

ls1 = lapply(1:k,function(x) 0:n)

I changed 'ls' to 'ls1' because ls() is an R function.

Upvotes: 1

Related Questions