Fast algorithm for cumulative intersection of two arrays

Question

I want to calculate the cumulative intersection of two arrays. For example

> cumulative_intersection([1,3,4,5,2,6], [4,1,5,2,3,7])
[[], [1], [1,4], [1,4,5], [1,4,5,3,2], [1,4,5,3,2]]

I can brute-force implement this using a loop. For example, in Python,

a1 = [1,3,4,5,2,6]
a2 = [4,1,5,2,3,7]
n = len(a1)
all_intersects = list()
for i in range(n):
    intersect = np.intersect1d(a1[:i], a2[:i])
    all_intersects.append(intersect)

Is there a more efficient algorithm to compute this? My intuition is that computation can be reduced logarithmically, because each successive list is a superset of the previous one.

victor zeng · Accepted Answer

If you replace your intersection algorithm with one that doesn't sort the result (and thus takes linear time), your brute force algorithm's asymptotic complexity becomes O(n²) which is already at your problem's lower bound asymptotic complexity.

To demonstrate this lower bound, let's assume for the sake of argument, that you have an O(1) method for determining what numbers needed to be added to the ith intersection to get the (i+1)th intersection. Therefore, if C(i) is the time to copy the ith intersection, your total runtime would be

$O(1) + \sum_{i=1}^{n}\left[ C(i-1) + O(1)\right] = O(1) + \sum_{i=1}^{n}C(i-1) + \sum_{i=1}^{n}O(1)$

Now then, copying an array is an O(n) operation, and in the worst case where your two arrays are the same, the ith intersection has length i+1. Therefore, your worse case run-time is

$O(1) + \sum_{i=1}^{n}O(i-1) + \sum_{i=1}^{n}O(1) = O(n^2) + O(n) =O(n^2)$

That being said, you can beat your naive algorithm's run-time by a constant factor by modifying the O(n) greedy algorithm for generating the intersection of two lists.

def intersection(a: list,b: list):
    ''' Computes the interection of lists a and b. Assumes that a and b have the same 
    length'''
    a_leftover = set()
    b_leftover = set()
    stored = set() # We only want unique values
    n = len(a)
    result = []
    for i in range(n):
        a_elm = a[i]
        b_elm = b[i]
        if a_elm not in stored and a_elm == b_elm:
            result.append(a_elm)
            stored.add(a_elm)
        else:
            if a_elm not in stored:
                if a_elm in b_leftover:
                    # a_elm was previously encountered in b
                    result.append(a_elm)
                    stored.add(a_elm)
                    b_leftover.remove(a_elm)
                else:
                    a_leftover.add(a_elm)
            if b_elm not in stored:
                if b_elm in a_leftover:
                    # b_elf was previously encountered in a
                    result.append(b_elm)
                    stored.add(b_elm)
                    a_leftover.remove(b_elm)
                else:
                    b_leftover.add(b_elm)
    return result

All you would need to do is modify the algorithm to store and return the intermediate results.

def cumulative_intersection(a: list,b: list):
    ''' Computes the cumulative interection of lists a and b. Assumes that a and b have the same 
    length'''
    a_leftover = set()
    b_leftover = set()
    stored = set() # We only want unique values
    n = len(a)
    result = []
    results = []
    for i in range(n):
        a_elm = a[i]
        b_elm = b[i]
        if a_elm not in stored and a_elm == b_elm:
            result.append(a_elm)
            stored.add(a_elm)
        else:
            if a_elm not in stored:
                if a_elm in b_leftover:
                    # a_elm was previously encountered in b
                    result.append(a_elm)
                    stored.add(a_elm)
                    b_leftover.remove(a_elm)
                else:
                    a_leftover.add(a_elm)
            if b_elm not in stored:
                if b_elm in a_leftover:
                    # b_elf was previously encountered in a
                    result.append(b_elm)
                    stored.add(b_elm)
                    a_leftover.remove(b_elm)
                else:
                    b_leftover.add(b_elm)
        results.append(result[:])
    return results

Fast algorithm for cumulative intersection of two arrays

Answers (2)

Related Questions