R : Find combinations of columns which match specified result

Question

I have a dataframe with hundreds of lines and 10s of columns, and need to find the rows, which sum is within a range. I searched for permutations and computations and subset sum problem, but none of the provided solutions seem to match the desired result.

Please advice me, whether there exists a package that solves such problem. Is there a function or any vectorised way to solve these kind of "iterations" in R?

# sample dataframe
x <- data.frame(a=c("A","B","C","D"),b=c(1,2,1,1))

Let assume, the sum of any accepted combination is 3, then the desired result could be a list like

[[1]]     # combination 1
[1] 
1,2       # lists all rows used
[2]  
1,2      # lists all values use

[[2]]     # combination 2
[1]       
2,1       # lists all rows used
[2]
2,1       # lists all values used

[[3]]     # combination 3
[1]       
2,4       # lists all rows used
[2]       
2,1       # lists all values used

[[4]]     # combination 4
[1]
1,3,4     # lists all rows used
[2]       
1,1,1     # lists all values used

(# comments: these are only added for explanatory purposes)

Note:

not all possible combinations are required, and not all values have to be used!
a row must only be used ONCE in a given combination (i.e. summing 3 times row one is no option!)
combinations could be sum(x[1:2,2]) as well as (x[1,2] x[2,2]+ .... + x[n,2]))

bouncyball · Accepted Answer

I hope I understand your question correctly. But, supposing we have some matrix dat, we wish to sum up (for each column) the different combinations of rows. We can accomplish this using the *apply family of functions, along with combn.

Here's what we do:

Loop over the columns of our matrix (using apply)
For each column of our matrix, sum up the unique combinations of rows (using lapply and apply)
We generate the unique combinations using the combn function within an sapply call

generate sample `dat`

set.seed(123)
dat <- matrix(rnorm(5 * 6), nrow = 5, ncol = 6)

loop through each column of `dat`

big_list <- apply(dat, 2, FUN = function(matcol) # over the columns of dat
  lapply(sapply(1:5, FUN = function(x) combn(1:5, x)), # loop through unique combinations of rows in dat
         FUN = function(combs) 
           apply(combs, 2, #over the columns of unique combinations
                 FUN = function(rows) 
                   data.frame(
                     'rows_used' = paste(rows, collapse = ', '), 
                     'n_rows' = length(rows), 
                     'sum' = sum(matcol[rows]))))) #sum up the rows

[[1]] # column 
[[1]][[1]] #[[n_rows]][[n_comb]]
  rows_used n_rows        sum
1         1      1 -0.5604756

[[1]][[2]]
  rows_used n_rows        sum
1         2      1 -0.2301775

[[1]][[3]]
  rows_used n_rows      sum
1         3      1 1.558708

After we've gone through each column and each combination of rows, we can extract the data from the list into a data.frame. Suppose, for example, we were interested in the sums from column 6:

Using the results

column <- 6
df_from_list <- do.call('rbind',
        lapply(big_list[[column]], 
                 FUN = function(x) do.call('rbind', x)))

       rows_used n_rows        sum
1              1      1 -1.6866933
2              2      1  0.8377870
3              3      1  0.1533731
4              4      1 -1.1381369
5              5      1  1.2538149

Then, we can use the subset function (or dplyr::filter) to get all combinations of n rows in column 6 where the sum is >= 0 and <= 0.5:

subset(df_from_list, sum >= 0 & sum <= .5)

   rows_used n_rows       sum
3          3      1 0.1533731
15      4, 5      2 0.1156780
18   1, 2, 5      3 0.4049087
25   3, 4, 5      3 0.2690511

Side note

It wouldn't surprise if this calculation method doesn't scale well, and I'm sure there is a more efficient solution. The structure of the problem as I've solved it results in a nested list structure, meaning a user should be familiar with the list object in R.

R : Find combinations of columns which match specified result

Answers (1)

generate sample `dat`

loop through each column of `dat`

Using the results

Side note

Related Questions

R : Find combinations of columns which match specified result

Answers (1)

generate sample dat

loop through each column of dat

Using the results

Side note

Related Questions

generate sample `dat`

loop through each column of `dat`