user1752610
user1752610

Reputation: 75

All possible combinations of a set that sum to a target value

I have an input vector such as:

weights <- seq(0, 1, by = 0.2)

I would like to generate all the combinations of weights (repeats allowed) such that the sum is equal to 1. I came up with

l <- rep(list(weights), 10)
combinations <- expand.grid(l)
combinations[which(apply(combinations, 1, sum) == 1),]

The problem is of course I generate far more combinations that I need. Is there a way to get it done more efficiently?

EDIT: Thanks for the answers. That's the first part of the problem. As @Frank pointed out, now that I have all the "solutions" that add up to 1, the problem is to get all the permutations (not sure if it is the right word) from the solutions in a vector of length 10. For instance:

s1 <- c(0, 0, 0.2, 0, 0, 0, 0.8, 0, 0, 0)
s2 <- c(0.8, 0, 0, 0, 0, 0, 0, 0, 0.2, 0)
etc...

Upvotes: 4

Views: 6639

Answers (5)

ThomasIsCoding
ThomasIsCoding

Reputation: 101034

If you are planning to implement it using base R only, then an alternative approach is the recursion.

Assuming x <- c(1,2,4,8), and s <- 9 denotes the target sum, then the following function can get you there:

f <- function(s, x, xhead = head(x,1), r = c()) {
  if (s == 0) {
    return(list(r))
  } else {
    x <- sort(x,decreasing = T)
    return(unlist(lapply(x[x<=min(xhead,s)], function(k) f(round(s-k,10), x[x<= round(s-k,10)], min(k,head(x[x<=round(s-k,10)],1)), c(r,k))),recursive = F)) 
  }
}

which f(s,x) gives:

[[1]]
[1] 8 1

[[2]]
[1] 4 4 1

[[3]]
[1] 4 2 2 1

[[4]]
[1] 4 2 1 1 1

[[5]]
[1] 4 1 1 1 1 1

[[6]]
[1] 2 2 2 2 1

[[7]]
[1] 2 2 2 1 1 1

[[8]]
[1] 2 2 1 1 1 1 1

[[9]]
[1] 2 1 1 1 1 1 1 1

[[10]]
[1] 1 1 1 1 1 1 1 1 1

Note: round(*,digits=10) is used to take care of floating-point numbers, where digits should adapt to decimals of input.

Upvotes: 1

Geoffrey Poole
Geoffrey Poole

Reputation: 1268

If you want to use base R, here is a nifty bit of recursive code that I came up with for this problem; it returns results as a list, so isn't a complete answer to the specific question.

combnToSum = function(target, values, collapse = T) {

  if(any(values<=0)) stop("All values must be positive numbers.")

  appendValue = function(root) {
    if(sum(root) == target) return(list(root))

    candidates = values + sum(root) <= target
    if(length(root)>0 & collapse) candidates = candidates & values >= root[1]

    if(!any(candidates)) return(NULL)

    roots = lapply(values[candidates], c, root)
    return(unlist(lapply(roots, addValue), recursive = F))
  }

  appendValue(integer(0))
}

The code is fairly efficient, solving the test problem in a blink.

combnToSum(1, c(.2,.4,.6,.8,1))
# [[1]]
# [1] 0.2 0.2 0.2 0.2 0.2
#
# [[2]]
# [1] 0.4 0.2 0.2 0.2
#
# [[3]]
# [1] 0.6 0.2 0.2
#
# [[4]]
# [1] 0.4 0.4 0.2
#
# [[5]]
# [1] 0.8 0.2
#
# [[6]]
# [1] 0.6 0.4
#
# [[7]]
# [1] 1

An error can occur when values contains numbers that are small relative to target. For instance, trying to find all of the ways to make change for $10 US:

combnToSum(1000, c(1, 5, 10, 25))

yields the following error

# enter code here`Error: evaluation nested too deeply: infinite recursion / options(expressions=)?

I have appendValue as a function nested within the scope of combnToSum so that target and values don't have to be copied and passed for each call (internally, within R). I also like the nice clean signature combnToSum(target, values); the user doesn't need to know about the intermediate value root.

That said, appendValue could be a separate function with the signature appendValue(target, values, root), in which case you could just use appendValue(1, c(0.2, 0.4, 0.6, 0.8, 1), integer(0)) to get the same answer. But you'd either lose the error check for negative values or, if you put the error check into appendValue, the error check would occur for each recursive call to the function, which seems a bit inefficient.

Setting collapse = F will return all of the permutations that have unique order.

combnToSum(1, c(.2,.4,.6,.8,1), collapse = F)
# [[1]]
# [1] 0.2 0.2 0.2 0.2 0.2
# 
# [[2]]
# [1] 0.4 0.2 0.2 0.2
# 
# [[3]]
# [1] 0.2 0.4 0.2 0.2
# 
# [[4]]
# [1] 0.6 0.2 0.2
# 
# [[5]]
# [1] 0.2 0.2 0.4 0.2
# 
# [[6]]
# [1] 0.4 0.4 0.2
# 
# [[7]]
# [1] 0.2 0.6 0.2
# 
# [[8]]
# [1] 0.8 0.2
# 
# [[9]]
# [1] 0.2 0.2 0.2 0.4
# 
# [[10]]
# [1] 0.4 0.2 0.4
# 
# [[11]]
# [1] 0.2 0.4 0.4
# 
# [[12]]
# [1] 0.6 0.4
# 
# [[13]]
# [1] 0.2 0.2 0.6
# 
# [[14]]
# [1] 0.4 0.6
# 
# [[15]]
# [1] 0.2 0.8
# 
# [[16]]
# [1] 1

Upvotes: 1

josliber
josliber

Reputation: 44299

Finding any subset of a set of integers that sums to some target t is a form of the subset sum problem, which is NP-complete. As a result, efficiently computing all the combinations (repeats allowed) of your set that sum to a target value is theoretically challenging.

To tractably solve a special case of the subset sum problem, let's recast your problem by assuming the input is positive integers (for your example w <- c(2, 4, 6, 8, 10); I won't consider non-positive integers or non-integers in this answer) and that the target is also a positive integer (in your example 10). Define D(i, j) to be the set of all combinations that sum to i among the first j elements of the set w. If there are n elements in w, then you are interested in D(t, n).

Let's start with a few base cases: D(0, k) = {{}} for all k >= 0 (the only way to sum to 0 is to include none of the elements) and D(k, 0) = {} for any k > 0 (you can't sum to a positive number with zero elements). Now consider the following pseudocode to compute arbitrary D(i, j) values:

for j = 1 ... n
  for i = 1 ... t
    D[(i, j)] = {}
    for rep = 0 ... floor(i/w_j)
      Dnew = D[(i-rep*w_j, j-1)], with w_j added "rep" times
      D[(i, j)] = Union(D[(i, j)], Dnew)

Note that this could still be quite inefficient (D(t, n) can contain an exponentially large number of feasible subsets so there is no avoiding this), but in many cases where there are a relatively small number of feasible combinations that sum to the target this could be quite a bit quicker than simply considering every single subset of the set (there are 2^n such subsets, so that approach always has exponential runtime).

Let's use R to code up your example:

w <- c(2, 4, 6, 8, 10)
n <- length(w)
t <- 10
D <- list()
for (j in 0:n) D[[paste(0, j)]] <- list(c())
for (i in 1:t) D[[paste(i, 0)]] <- list()
for (j in 1:n) {
  for (i in 1:t) {
    D[[paste(i, j)]] <- do.call(c, lapply(0:floor(i/w[j]), function(r) {
      lapply(D[[paste(i-r*w[j], j-1)]], function(x) c(x, rep(w[j], r)))
    }))
  }
}
D[[paste(t, n)]]
# [[1]]
# [1] 2 2 2 2 2
# 
# [[2]]
# [1] 2 2 2 4
# 
# [[3]]
# [1] 2 4 4
# 
# [[4]]
# [1] 2 2 6
# 
# [[5]]
# [1] 4 6
# 
# [[6]]
# [1] 2 8
# 
# [[7]]
# [1] 10

The code correctly identifies all combinations of elements in the set that sum to 10.

To efficiently get all 2002 unique length-10 combinations, we can use the allPerm function from the multicool package:

library(multicool)
out <- do.call(rbind, lapply(D[[paste(t, n)]], function(x) {
  allPerm(initMC(c(x, rep(0, 10-length(x)))))
}))
dim(out)
# [1] 2002   10
head(out)
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    2    2    2    2    2    0    0    0    0     0
# [2,]    0    2    2    2    2    2    0    0    0     0
# [3,]    2    0    2    2    2    2    0    0    0     0
# [4,]    2    2    0    2    2    2    0    0    0     0
# [5,]    2    2    2    0    2    2    0    0    0     0
# [6,]    2    2    2    2    0    2    0    0    0     0

For the given input, the whole operation is pretty quick (0.03 seconds on my computer) and doesn't use a huge amount of memory. Meanwhile the solution in the original post ran in 22 seconds and used 15 GB of memory, even when replacing the last line to the (much) more efficient combinations[rowSums(combinations) == 1,].

Upvotes: 4

Rorschach
Rorschach

Reputation: 32416

Take a look at partitions library,

library(partitions)
ps <- parts(10)
res <- ps[,apply(ps, 2, function(x) all(x[x>0] %% 2 == 0))] / 10

Upvotes: 4

Chris Watson
Chris Watson

Reputation: 1367

For the combinations, do you want this:

combinations <- lapply(seq_along(weights), function(x) combn(weights, x))

Then for the sums:

sums <- lapply(combinations, colSums)
inds <- lapply(sums, function(x) which(x == 1))
lapply(seq_along(inds), function(x) combinations[[x]][, inds[[x]]])

Upvotes: 0

Related Questions