Reputation: 43
I am wondering if there is a simple function to solve the following problem in R:
Suppose I have the following dataframe:
Now I know that the sum of various values in B equal the values in A, e.g. '3 + 4 + 3 = 10' and '17 + 18 = 35', which always balances out in the complete dataset.
Question
Is there a function that can sum these values in B, through trial and error I suppose, and match the correctly summed values with A? For example, the function tries to sum 3 + 4 + 18, which is 25 and retries this because 25 is not a value in A.
I have tried several solutions myself but one problem that I often encountered was the fact that A always has less observations than B.
I would be very thankful if someone can help me out here! If more info is needed please let me know.
Cheers,
Daan
Edit
This example is with simplified numbers. In reality, it is a large dataset, so I am looking for a scalable solution.
Thanks again!
Upvotes: 4
Views: 250
Reputation: 24945
This is a problem know as the subset sum problem, and there are a ton of examples online of how to solve it using dynamic programming, or greedy algorithms.
To give you an answer that just works, the package adagio
has an implementation:
library(adagio)
sums = c(10, 35, 90)
values = c(3, 4, 17, 18, 50, 40, 3)
for(i in sums){
#we have to subset the values to be less than the value
#otherwise the function errors:
print(subsetsum(values[values < i], i))
}
The output for each sum is a list, with the val and the indices in the array, so you can tidy up the output depending on what you want from there.
Upvotes: 1
Reputation: 1101
You can try the following but I am affraid is not scalable. For the case of 3 summands you have
x <- expand.grid(c(3, 4, 17, 18, 50, 40, 3),#building a matrix of the possible combinations of summands
c(3, 4, 17, 18, 50, 40, 3),
c(3, 4, 17, 18, 50, 40, 3))
x$sums <-rowSums(x) #new column with possible sums
idx<- x$sums%in%c(10, 35, 90) #checking the sums are in the required total
x[idx,]
Var1 Var2 Var3 sums
2 4 3 3 10
8 3 4 3 10
14 3 4 3 10
44 4 3 3 10
50 3 3 4 10
56 3 3 4 10
92 3 3 4 10
98 3 3 4 10
296 4 3 3 10
302 3 4 3 10
308 3 4 3 10
338 4 3 3 10
For the case of 2 summands
x <- expand.grid(c(3, 4, 17, 18, 50, 40, 3),
c(3, 4, 17, 18, 50, 40,3))
x$sums <-rowSums(x)
idx<- x$sums%in%c(10, 35, 90)
#Results
x[idx,]
Var1 Var2 sums
18 18 17 35
24 17 18 35
34 40 50 90
40 50 40 90
Upvotes: 0