nishant_boro
nishant_boro

Reputation: 382

Given an array of integers, create partitions where the sum of elements in every partition is 0 and maximum no of partitions are formed

My rules:

Example: if A = {-2,-4,-6,+6,+6} then B={{-2,-4,6},{+6,-6}} is the result with the max no of partitions

FYI, I want to return all the partitions and not the number of partitions.

From my research, this seemed like a NP-hard/complete problem. But I'm not sure about it and would really appreciate if someone could explain the best way to go about it(even if its slow). Some psuedo-code would be appreciated too.

Thank you.

Upvotes: 0

Views: 454

Answers (2)

btilly
btilly

Reputation: 46408

This definitely has an NP-hard feel to it. In particular the question of whether it is possible to do 2 partitions vs 1 is a question of whether a proper subset of all the elements except the last add up to the negative of the last, which is a variant of the subset sum problem. So even verifying that the answer can't be improved further by splitting one of the partitions is probably NP-complete!

But how would one solve this in practice?

Step 1 is produce a data structure representing all possible partitions including the first element that sum to 0. This can be solved as in the standard subset sum algorithm. With the idea of a doubly linked list from which we can get the information. (The doubly linked list will generally be proportional to the size of your array times the number of distinct sums found, but when decoded it can result in an exponential number of partitions.

Doubly linked lists can make your head spin, so you probably want to use syntactic sugar as follows:

from collections import namedtuple
Node = namedtuple('Node', ['ind', 'prev'])
Paths = namedtuple('Paths', ['node', 'prev'])

Where ind is an index into your array, node is always a Node, and prev is always a Paths or None.

And now a Node says "include this index and any valid path in the following previous choices". And a Paths says "here is a node that leads here, and a previous paths for other ways."

With this we can do the following:

# We only want paths that take the 0th index.
ways_to_get_to_n = {elements[0], Paths(Node(0, None), None)}

for i in range(1, len(elements)):
    element = elements[i]
    new_ways_to_get_n = {}
    for value, ways in ways_to_get_n.items():
        new_ways_to_get_n[value] = ways
    for value, ways in ways_to_get_n.items():
        if value + element not in new_ways_to_get_n:
            new_ways_to_get_n[value + element] = Paths(Node(i, ways), None)
        else:
            new_ways_to_get_n[value + element] = Paths(Node(i, ways), new_ways_to_get_n[value + element])
    ways_to_get_n = new_ways_to_get_n

When you are done, ways_to_get_n[0] is a fairly concise data structure that can be used with a double recursion to walk through all partitions that include the first element. However there is a challenge. These partitions may have 0 partitions inside of them. So as you walk down the double recursion, carry with it a data structure of all values you can reach (the same old subset sum trick), and terminate early whenever 0 shows up. (This bookkeeping may feel like a lot of extra work, but it actually will save you a lot more.)

And now you are able to recursively find minimal partitions with the first element. Then recursively look for how to partition the remaining elements. Every time you do, you compare to the best you currently have, and if it is an improvement, record that.

When you've gone through every way to decompose it, you're done.

Assuming arrays of integers (so subset sum pseudopolynomial is likely to work in your favor) this will allow you to fairly efficiently do recursion on minimal partitions. Which is a better recursive explosion than a naive approach. But it will be a lot bigger than you think.

I suspect that sorting the array in increasing absolute value as a first step will make this algorithm more efficient by making the "filter for irreducible partition" likely to break out of the recursion early when you still have a lot of elements.

Upvotes: 2

Prune
Prune

Reputation: 77857

I agree that the problem is NP. Optimizing this is ugly, with a capital "ugh". You can improve the search time somewhat, but I fear that it's still O(2^N) and worse.

  • Partition the list into positive and negative numbers; sort both lists.
  • For each element of the shorter list, look for its complement in the other. Each such pair is a partition; these can be safely put on the solution list and removed from further processing.

Now comes the ugly parts. The "relative" brute-force approach is to generate the power set of each partition; index by sum. For instance, given the list 2, 3, 4, 7:

 2  [2]
 3  [3]
 4  [4]
 5  [2, 3]
 6  [2, 4]
 7  [7] [3, 4]
 9  [2, 7] [2, 3, 4]
10  [3, 7]
11  [4, 7]
12  [2, 3, 7]
13  [2, 4, 7]
14  [3, 4, 7]
16  [2, 3, 4, 7]

Now find all matches of abs(sum(subset)) between the positive and negative listings. These form the choices for your solution space. My best recommendation from this point is to apply the set coverage problem to this; you'll have to tweak slightly for duplicate values.

Upvotes: 1

Related Questions