Reputation: 194

Largest subarray with sum equal to 0

This is a typical interview question. Given an array that contains both positive and negative elements without 0, find the largest subarray whose sum equals 0. I tried to solve this. This is what I came up with.

def sub_array_sum(array,k=0):
    start_index = -1
    hash_sum = {}
    current_sum = 0
    keys = set()
    best_index_hash = {}
    for i in array:
        start_index += 1
        current_sum += i
        if current_sum in hash_sum:
            hash_sum[current_sum].append(start_index)
            keys.add(current_sum)
        else:
            if current_sum == 0:
                best_index_hash[start_index] = [(0,start_index)]
            else:
                hash_sum[current_sum] = [start_index]
    if keys:
        for k_1 in keys:
            best_start = hash_sum.get(k_1)[0]
            best_end_list = hash_sum.get(k_1)[1:]
            for best_end in best_end_list:
                if abs(best_start-best_end) in best_index_hash:
                    best_index_hash[abs(best_start-best_end)].append((best_start+1,best_end))
                else:
                    best_index_hash[abs(best_start-best_end)] = [(best_start+1,best_end)]

    if best_index_hash:
        (bs,be) = best_index_hash[max(best_index_hash.keys(),key=int)].pop()
        return array[bs:be+1]
    else:
        print "No sub array with sum equal to 0"


def Main():
    a = [6,-2,8,5,4,-9,8,-2,1,2]
    b = [-8,8]
    c = [-7,8,-1]
    d = [2200,300,-6,6,5,-9]
    e = [-9,9,-6,-3]
    print sub_array_sum(a)
    print sub_array_sum(b)
    print sub_array_sum(c)
    print sub_array_sum(d)
    print sub_array_sum(e)

if __name__ == '__main__':
    Main()

I am not sure if this will satisfy all the edge case. if someone can comment on that, it would be excellent Also i want to extend this to sum equalling to any K not just 0. How should i go about it. And any pointers to optimize this further is also helpful.

Upvotes: 1

Answers (4)

Shreyas J

Reputation: 1

Here's the solution taken from LeetCode :

def sub_array_sum(nums, k=0):
    count, sum = 0, 0
    map = dict()
    map[0] = 1
    for i in range(len(nums)):
        sum += nums[i]
        if map.__contains__(sum - k):
            count += map[sum - k]
        map[sum] = map.get(sum, 0) + 1
return count

Upvotes: 0

j_random_hacker

Reputation: 51316

You have given a nice, linear-time solution (better than the two other answers at this time, which are quadratic-time), based on the idea that whenever sum(i .. j) = 0, it must be that sum(0 .. i-1) = sum(0 .. j) and vice versa. Essentially you compute the prefix sums sum(0 .. i) for all i, building up a hashtable hash_sum in which hash_sum[x] is a list of all positions i having sum(0 .. i) = x. Then you go through this hashtable, one sum at a time, looking for any sum that was made by more than one prefix. Among all such made-more-than-once sums, you choose the one that was made by a pair of prefixes that are furthest apart -- this is the longest.

Since you already noticed the key insight needed to make this algorithm linear-time, I'm a bit puzzled as to why you build up so much unnecessary stuff in best_index_hash in your second loop. For a given sum x, the furthest-apart pair of prefixes that make that sum will always be the smallest and largest entries in hash_sum[x], which will necessarily be the first and last entries (because that's the order they were appended), so there's no need to loop over the elements in between. In fact you don't even need a second loop at all: you can keep a running maximum during your first loop, by treating start_index as the rightmost endpoint.

To handle an arbitrary difference k: Instead of finding the leftmost occurrence of a prefix summing to current_sum, we need to find the leftmost occurrence of a prefix summing to current_sum - k. But that's just first_with_sum{current_sum - k}.

The following code isn't tested, but should work:

def sub_array_sum(array,k=0):
    start_index = -1
    first_with_sum = {}
    first_with_sum{0} = -1
    best_start = -1
    best_len = 0
    current_sum = 0
    for i in array:
        start_index += 1
        current_sum += i
        if current_sum - k in first_with_sum:
            if start_index - first_with_sum{current_sum - k} > best_len:
                best_start = first_with_sum{current_sum - k} + 1
                best_len = start_index - first_with_sum{current_sum - k}
        else:
            first_with_sum{current_sum} = start_index

    if best_len > 0:
        return array[best_start:best_start+best_len-1]
    else:
        print "No subarray found"

Setting first_with_sum{0} = -1 at the start means that we don't have to treat a range beginning at index 0 as a special case. Note that this algorithm doesn't improve on the asymptotic time or space complexity of your original one, but it's simpler to implement and will use a small amount less space on any input that contains a zero-sum subarray.

Upvotes: 2

Chris Martin

Reputation: 30756

Here's my own answer, just for fun.

The number of subsequences is quadratic, and the time to sum a subsequence is linear, so the most naive solution would be cubic.

This approach is just an exhaustive search over the subsequences, but a little trickery avoids the linear summing factor, so it's only quadratic.

from collections import namedtuple
from itertools import chain


class Element(namedtuple('Element', ('index', 'value'))):
    """
    An element in the input sequence. ``index`` is the position
    of the element, and ``value`` is the element itself.
    """
    pass


class Node(namedtuple('Node', ('a', 'b', 'sum'))):
    """
    A node in the search graph, which looks like this:

         0      1       2      3
          \    /  \    /  \   /
           0-1     1-2     2-3
              \   /   \   /
               0-2     1-3
                  \   /
                   0-3

    ``a`` is the start Element, ``b`` is the end Element, and
    ``sum`` is the sum of elements ``a`` through ``b``.
    """

    @classmethod
    def from_element(cls, e):
        """Construct a Node from a single Element."""
        return Node(a=e, b=e, sum=e.value)

    def __add__(self, other):
        """The combining operation depicted by the graph above."""
        assert self.a.index == other.a.index - 1
        assert self.b.index == other.b.index - 1
        return Node(a=self.a, b=other.b, sum=(self.sum + other.b.value))

    def __len__(self):
        """The number of elements represented by this node."""
        return self.b.index - self.a.index + 1


def get_longest_k_sum_subsequence(ints, k):
    """The longest subsequence of ``ints`` that sums to ``k``."""
    n = get_longest_node(n for n in generate_nodes(ints) if n.sum == k)
    if n:
        return ints[n.a.index:(n.b.index + 1)]
    if k == 0:
        return []


def get_longest_zero_sum_subsequence(ints):
    """The longest subsequence of ``ints`` that sums to zero."""
    return get_longest_k_sum_subsequence(ints, k=0)


def generate_nodes(ints):
    """Generates all Nodes in the graph."""
    nodes = [Node.from_element(Element(i, v)) for i, v in enumerate(ints)]
    while len(nodes) > 0:
        for n in nodes:
            yield n
        nodes = [x + y for x, y in zip(nodes, nodes[1:])]


def get_longest_node(nodes):
    """The longest Node in ``nodes``, or None if there are no Nodes."""
    return max(chain([()], nodes), key=len) or None


if __name__ == '__main__':

    def f(*ints):
        return get_longest_zero_sum_subsequence(list(ints))

    assert f() == []
    assert f(1) == []
    assert f(0) == [0]
    assert f(0, 0) == [0, 0]
    assert f(-1, 1) == [-1, 1]
    assert f(-1, 2, 1) == []
    assert f(1, -1, 1, -1) == [1, -1, 1, -1]
    assert f(1, -1, 8) == [1, -1]
    assert f(0, 1, -1, 8) == [0, 1, -1]
    assert f(5, 6, -2, 1, 1, 7, -2, 2, 8) == [-2, 1, 1]
    assert f(5, 6, -2, 2, 7, -2, 1, 1, 8) == [-2, 1, 1]

Upvotes: 2

Sriram

Reputation: 511

I agree with sundar nataraj when he says that this must be posted to the code review forum.

For fun though I looked at your code. Though I am able to understand your approach, I fail to understand the need to use Counter.

best_index_hash[start_index] = [(0,start_index)] - Here best_index_hash is of the type Counter. Why are you assigning a list to it?
for key_1, value_1 in best_index_hash.most_common(1) - You trying to get largest subsequence and for that you are using most_common as the answer. This is not intuitive semantically.

I am tempted to post a solution but I will wait for you to edit the code snippet and improve it.

Addendum

For fun, I had a go at this puzzle and I present my effort below. I make no guarantees of correctness/completeness.

from collections import defaultdict

def max_sub_array_sum(a, s):
    if a:
        span = defaultdict(lambda : (0,0))
        current_total = 0
        for i in xrange(len(a)):
            current_total = a[i]
            for j in xrange (i + 1, len(a)):
                current_total +=  a[j]
                x,y = span[current_total]
                if j - i > y - x:
                    span[current_total] = i,j

        if s in span:
            i, j = span[s]
            print "sum=%d,span_length=%d,indices=(%d,%d),sequence=%s" %\
                    (s, j-i + 1, i, j, str(a[i:j + 1]))
            return
    print "Could not find a subsequence of sum %d in sequence %s" % \
            (s, str(a))

max_sub_array_sum(range(-6, -1), 0)
max_sub_array_sum(None, 0)
max_sub_array_sum([], 0)
max_sub_array_sum(range(6), 15)
max_sub_array_sum(range(6), 14)
max_sub_array_sum(range(6), 13)
max_sub_array_sum(range(6), 0)

Upvotes: 1

Largest subarray with sum equal to 0

Answers (4)

Related Questions