Kira Tebbe
Kira Tebbe

Reputation: 586

Find all combinations of a set of numbers that add up to a certain total

I've seen a few solutions to similar problems, but they all require iteration over the number of items to be added together.

Here's my goal: from a list of numbers, find all of the combinations (without replacement) that add up to a certain total. For example, if I have numbers 1,1,2,3,5 and total 5, it should return 5,2,3, and 1,1,3.

I was trying to use combn but it required you to specify the number of items in each combination. Is there a way to do it that allows for solution sets of any size?

Upvotes: 10

Views: 6752

Answers (5)

Joseph Wood
Joseph Wood

Reputation: 7597

This is precisely what combo/permuteGeneral from RcppAlgos (I am the author) were built for. Since we have repetition of specific elements in our sample vector, we will be finding combinations of multisets that meet our criteria. Note that this is different than the more common case of generating combinations with repetition where each element is allowed to be repeated m times. For many combination generating functions, multisets pose problems as duplicates are introduced and must be dealt with. This can become a bottleneck in your code if the size of your data is decently large. The functions in RcppAlgos handle these cases efficiently without creating any duplicate results. I should mention that there are a couple of other great libraries that handle multisets quite well: multicool and arrangements.

Moving on to the task at hand, we can utilize the constraint arguments of comboGeneral to find all combinations of our vector that meet a specific criteria:

vec <- c(1,1,2,3,5)  ## using variables from @r2evans
uni <- unique(vec)
myRep <- rle(vec)$lengths
ans <- 5

library(RcppAlgos)
lapply(seq_along(uni), function(x) {
    comboGeneral(uni, x, freqs = myRep,
                 constraintFun = "sum",
                 comparisonFun = "==",
                 limitConstraints = ans)
})

[[1]]
[,1]
[1,]    5

[[2]]
[,1] [,2]
[1,]    2    3

[[3]]
[,1] [,2] [,3]
[1,]    1    1    3

[[4]]
[,1] [,2] [,3] [,4]  ## no solutions of length 4

These functions are highly optimized and extend well to larger cases. For example, consider the following example that would produce over 30 million combinations:

## N.B. Using R 4.0.0 with new updated RNG introduced in 3.6.0
set.seed(42)
bigVec <- sort(sample(1:30, 40, TRUE))

rle(bigVec)
Run Length Encoding
  lengths: int [1:22] 2 1 2 3 4 1 1 1 2 1 ...
  values : int [1:22] 1 2 3 4 5 7 8 9 10 11 ...

bigUni <- unique(bigVec)
bigRep <- rle(bigVec)$lengths
bigAns <- 199
len <- 12

comboCount(bigUni, len, freqs = bigRep)
[1] 32248100

All 300000+ results are returned very quickly:

system.time(bigTest <- comboGeneral(bigUni, len, freqs = bigRep,
                                    constraintFun = "sum",
                                    comparisonFun = "==",
                                    limitConstraints = bigAns))
 user  system elapsed 
0.273   0.004   0.271

head(bigTest)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,]    1    1    2    3    4   25   26   26   26    27    28    30
[2,]    1    1    2    3    5   24   26   26   26    27    28    30
[3,]    1    1    2    3    5   25   25   26   26    27    28    30
[4,]    1    1    2    3    7   24   24   26   26    27    28    30
[5,]    1    1    2    3    7   24   25   25   26    27    28    30
[6,]    1    1    2    3    7   24   25   26   26    26    28    30

nrow(bigTest)
[1] 280018

all(rowSums(bigTest) == bigAns)
[1] TRUE

Addendum

I must mention that generally when I see a problem like: "finding all combinations that sum to a particular number" my first thought is integer partitions. For example, in the related problem Getting all combinations which sum up to 100 in R, we can easily solve with the partitions library. However, this approach does not extend to the general case (as we have here) where the vector contains specific repetition or we have a vector that contains values that don't easily convert to an integer equivalent (E.g. the vector (0.1, 0.2, 0.3, 0.4) can easily be treated as 1:4, however treating c(3.98486 7.84692 0.0038937 7.4879) as integers and subsequently applying an integer partitions approach would require an extravagant amount of computing power rendering this method useless).

Upvotes: 11

niko
niko

Reputation: 5281

Now here is a solution involving gtools:

# Creating lists of all permutations of the vector x
df1 <- gtools::permutations(n=length(x),r=length(x),v=1:length(x),repeats.allowed=FALSE)
ls1 <- list()
for(j in 1:nrow(df1)) ls1[[j]] <- x[df1[j,1:ncol(df1)]]  

# Taking all cumulative sums and filtering entries equaling our magic number
sumsCum <- t(vapply(1:length(ls1), function(j) cumsum(ls1[[j]]), numeric(length(x))))
indexMN <- which(sumsCum == magicNumber, arr.ind = T)
finalList <- list()
for(j in 1:nrow(indexMN)){
    magicRow <- indexMN[j,1]
    magicCol <- 1:indexMN[j,2]
    finalList[[j]] <- ls1[[magicRow]][magicCol]
}
finalList <- unique(finalList)

where x = c(1,1,2,3,5) and magicNumber = 5. This is a first draft, I am sure it can be improved here and there.

Upvotes: 5

mickey
mickey

Reputation: 2188

I took your combn idea and looped over the possible sizes of the sets.

func = function(x, total){
    M = length(x)
    y = NULL
    total = 15
    for (m in 1:M){
        tmp = combn(x, m)
        ind = which(colSums(tmp) == total)
        if (length(ind) > 0){
            for (j in 1:length(ind))
                y = c(y, list(tmp[,ind[j]]))
            }
        }
    return (unique(lapply(y, sort)))
    }

x = c(1,1,2,3,5,8,13)

> func(x, 15)
[[1]]
[1]  2 13

[[2]]
[1]  1  1 13

[[3]]
[1] 2 5 8

[[4]]
[1] 1 1 5 8

[[5]]
[1] 1 1 2 3 8

Obviously, this will have problems as M grows since tmp will get big pretty quickly and the length of y can't be (maybe?) pre-determined.

Upvotes: 6

moodymudskipper
moodymudskipper

Reputation: 47300

Not the most efficient but the most compact so far:

x <- c(1,1,2,3,5)
n <- length(x)
res <- 5
unique(combn(c(x,rep(0,n-1)), n, function(x) x[x!=0][sum(x)==res], FALSE))[-1]
# [[1]]
# [1] 1 1 3
# 
# [[2]]
# [1] 2 3
# 
# [[3]]
# [1] 5
# 

Upvotes: 3

r2evans
r2evans

Reputation: 160407

Similar to mickey's answer, we can use combn inside another looping mechanism. I'll use lapply:

vec <- c(1,1,2,3,5)
ans <- 5

Filter(length, lapply(seq_len(length(vec)),
       function(i) {
         v <- combn(vec, i)
         v[, colSums(v) == ans, drop = FALSE]
       }))
# [[1]]
#      [,1]
# [1,]    5
# [[2]]
#      [,1]
# [1,]    2
# [2,]    3
# [[3]]
#      [,1]
# [1,]    1
# [2,]    1
# [3,]    3

You can omit the Filter(length, portion, though it may return a number of empty matrices. They're easy enough to deal with and ignore, I just thought removing them would be aesthetically preferred.

This method gives you a matrix with multiple candidates in each column, so

ans <- 4
Filter(length, lapply(seq_len(length(vec)),
       function(i) {
         v <- combn(vec, i)
         v[, colSums(v) == ans, drop = FALSE]
       }))
# [[1]]
#      [,1] [,2]
# [1,]    1    1
# [2,]    3    3
# [[2]]
#      [,1]
# [1,]    1
# [2,]    1
# [3,]    2

If duplicates are a problem, you can always do:

Filter(length, lapply(seq_len(length(vec)),
       function(i) {
         v <- combn(vec, i)
         v <- v[, colSums(v) == ans, drop = FALSE]
         v[,!duplicated(t(v)),drop = FALSE]
       }))
# [[1]]
#      [,1]
# [1,]    1
# [2,]    3
# [[2]]
#      [,1]
# [1,]    1
# [2,]    1
# [3,]    2

Upvotes: 5

Related Questions