CaptainTrunky
CaptainTrunky

Reputation: 1707

Subset sum with maximum equal sums and without using all elements

You are given a set of integers and your task is the following: split them into 2 subsets with an equal sum in such way that these sums are maximal. You are allowed not to use all given integers, that's fine. If it's just impossible, report error somehow.

My approach is rather straightforward: at each step, we pick a single item, mark it as visited, update current sum and pick another item recursively. Finally, try skipping current element.

It works on simpler test cases, but it fails one:

T = 1

N = 25

Elements: 5 27 24 12 12 2 15 25 32 21 37 29 20 9 24 35 26 8 31 5 25 21 28 3 5

One can run it as follows:

1 25 5 27 24 12 12 2 15 25 32 21 37 29 20 9 24 35 26 8 31 5 25 21 28 3 5

I expect sum to be equal 239, but it the algorithm fails to find such solution.

I've ended up with the following code:

#include <iostream>
#include <unordered_set>

using namespace std;

unordered_set<uint64_t> visited;

const int max_N = 50;
int data[max_N];

int p1[max_N];
int p2[max_N];

int out1[max_N];
int out2[max_N];

int n1 = 0;
int n2 = 0;

int o1 = 0;
int o2 = 0;

int N = 0;

void max_sum(int16_t &sum_out, int16_t sum1 = 0, int16_t sum2 = 0, int idx = 0) {
  if (idx < 0 || idx > N) return;

  if (sum1 == sum2 && sum1 > sum_out) {
    sum_out = sum1;
    o1 = n1;
    o2 = n2;
    for(int i = 0; i < n1; ++i) {
      out1[i] = p1[i];
    }
    for (int i = 0; i < n2; ++i) {
      out2[i] = p2[i];
    }
  }
  if (idx == N) return;

  uint64_t key = (static_cast<uint64_t>(sum1) << 48) | (static_cast<uint64_t>(sum2) << 32) | idx;

  if (visited.find(key) != visited.end()) return;

  visited.insert(key);

  p1[n1] = data[idx];
  ++n1;

  max_sum(sum_out, sum1 + data[idx], sum2, idx + 1);
  --n1;

  p2[n2] = data[idx];
  ++n2;

  max_sum(sum_out, sum1, sum2 + data[idx], idx + 1);
  --n2;

  max_sum(sum_out, sum1, sum2, idx + 1);
}

int main() {
  int T = 0;
  cin >> T;

  for (int t = 1; t <= T; ++t) {
    int16_t sum_out;

    cin >> N;
    for(int i = 0; i < N; ++i) {
      cin >> data[i];
    }

    n1 = 0;
    n2 = 0;

    o1 = 0;
    o2 = 0;

    max_sum(sum_out);

    int res = 0;
    int res2 = 0;

    for (int i = 0; i < o1; ++i) res += out1[i];
    for (int i = 0; i < o2; ++i) res2 += out2[i];

    if (res != res2) cerr << "ERROR: " << "res1 = " << res << "; res2 = " << res2 << '\n';

    cout << "#" << t << " " << res << '\n';

    visited.clear();
  }
}

I have the following questions:

  1. Could someone help me to troubleshoot the failing test? Are there any obvious problems?
  2. How could I get rid of unordered_set for marking already visited sums? I prefer to use plain C.
  3. Is there a better approach? Maybe using dynamic programming?

Upvotes: 0

Views: 1174

Answers (3)

Rahul K Singh
Rahul K Singh

Reputation: 189

Another approach is consider all the numbers till [1,(2^N-2)]. Consider the position of each bit to position of each element .Iterate all numbers from [1,(2^N-2)] then check for each number . If bit is set you can count that number in set1 else you can put that number in set2 , then check if sum of both sets are equals or not . Here you will get all possible sets , if you want just one once you find just break.

Upvotes: 4

Peter de Rivaz
Peter de Rivaz

Reputation: 33509

1) Could someone help me to troubleshoot the failing test? Are there any obvious problems?

The only issue I could see is that you have not set sum_out to 0.

When I tried running the program it seemed to work correctly for your test case.

2) How could I get rid of unordered_set for marking already visited sums? I prefer to use plain C.

See the answer to question 3

3) Is there a better approach? Maybe using dynamic programming?

You are currently keeping track of whether you have seen each choice of value for first subset, value for second subset, amount through array.

If instead you keep track of the difference between the values then the complexity significantly reduces.

In particular, you can use dynamic programming to store an array A[diff] that for each value of the difference either stores -1 (to indicate that the difference is not reachable), or the greatest value of subset1 when the difference between subset1 and subset2 is exactly equal to diff.

You can then iterate over the entries in the input and update the array based on either assigning each element to subset1/subset2/ or not at all. (Note you need to make a new copy of the array when computing this update.)

In this form there is no use of unordered_set because you can simply use a straight C array. There is also no difference between subset1 and subset2 so you can only keep positive differences.

Example Python Code

from collections import defaultdict

data=map(int,"5 27 24 12 12 2 15 25 32 21 37 29 20 9 24 35 26 8 31 5 25 21 28 3 5".split())

A=defaultdict(int) # Map from difference to best value of subset sum 1
A[0] = 0 # We start with a difference of 0

for a in data:
    A2 = defaultdict(int)
    def add(s1,s2):
        if s1>s2:
            s1,s2=s2,s1
        d = s2-s1
        if d in A2:
            A2[d] = max( A2[d], s1 )
        else:
            A2[d] = s1
    for diff,sum1 in A.items():
        sum2 = sum1 + diff
        add(sum1,sum2)
        add(sum1+a,sum2)
        add(sum1,sum2+a)
    A = A2
print A[0]

This prints 239 as the answer.

For simplicity I haven't bothered with the optimization of using a linear array instead of the dictionary.

Upvotes: 2

Erwin Kalvelagen
Erwin Kalvelagen

Reputation: 16724

A very different approach would be to use a constraint or mixed integer solver. Here is a possible formulation.

Let

x(i,g) = 1 if value v(i) belongs to group g
         0 otherwise

The optimization model can look like:

 max s
 s = sum(i, x(i,g)*v(i))  for all g
 sum(g, x(i,g)) <= 1      for all i

For two groups we get:

----     31 VARIABLE s.L                   =      239.000  

----     31 VARIABLE x.L  

             g1          g2

i1            1
i2                        1
i3                        1
i4                        1
i5            1
i6                        1
i7                        1
i8            1
i9                        1
i10           1
i11           1
i12           1
i13                       1
i14           1
i15           1
i16                       1
i17           1
i18                       1
i19                       1
i20           1
i21           1
i22           1
i23                       1
i25                       1

We can easily do more groups. E.g. with 9 groups:

----     31 VARIABLE s.L                   =       52.000  

----     31 VARIABLE x.L  

             g1          g2          g3          g4          g5          g6          g7          g8          g9

i2                        1
i3                                                                                                            1
i4                                                                        1
i5                                                1
i6                                                                                    1
i7            1
i8                        1
i9                                                                                                1
i10                                   1
i11           1
i12                                                                                   1
i13                                                                                               1
i14                                               1
i15                                                           1
i16                                                                       1
i17                                   1
i19                                               1
i20                                   1
i21                                                                                                           1
i22                                                                                   1
i23                                                           1
i24                                                                                                           1
i25                                                                       1

If there is no solution, the solver will select zero elements in each group with a sum s=0.

Upvotes: 0

Related Questions