Reputation: 2957
I want to calculate the cumulative intersection of two arrays. For example
> cumulative_intersection([1,3,4,5,2,6], [4,1,5,2,3,7])
[[], [1], [1,4], [1,4,5], [1,4,5,3,2], [1,4,5,3,2]]
I can brute-force implement this using a loop. For example, in Python,
a1 = [1,3,4,5,2,6]
a2 = [4,1,5,2,3,7]
n = len(a1)
all_intersects = list()
for i in range(n):
intersect = np.intersect1d(a1[:i], a2[:i])
all_intersects.append(intersect)
Is there a more efficient algorithm to compute this? My intuition is that computation can be reduced logarithmically, because each successive list is a superset of the previous one.
Upvotes: 1
Views: 359
Reputation: 34829
Something like this perhaps:
a1 = [1,3,4,5,2,6]
a2 = [4,1,5,2,3,7]
n = len(a1)
all_intersects = list()
current_intersect = list()
s1 = set()
s2 = set()
for i in range(n):
if a2[i] not in s2 and a2[i] in s1:
current_intersect.append(a2[i])
if a1[i] not in s1 and a1[i] in s2:
current_intersect.append(a1[i])
all_intersects.append(list(current_intersect))
s1.add(a1[i])
s2.add(a2[i])
The general idea is to keep two sets of numbers that have been seen. Set s1
keeps track of numbers that have been seen in list a1
. Set s2
keeps track of numbers that have been seen in list a2
.
When processing a number from list a1
, if the number hasn't been seen yet, but has been seen in the list a2
, then we can add it to the list of numbers in current_intersect
. Likewise for the number from list a2
.
After updating current_intersect
, the current_intersect
is added to all_intersects
. Also, the current array elements are added to the two sets.
Upvotes: 1
Reputation: 111
If you replace your intersection algorithm with one that doesn't sort the result (and thus takes linear time), your brute force algorithm's asymptotic complexity becomes O(n2) which is already at your problem's lower bound asymptotic complexity.
To demonstrate this lower bound, let's assume for the sake of argument, that you have an O(1) method for determining what numbers needed to be added to the ith intersection to get the (i+1)th intersection. Therefore, if C(i) is the time to copy the ith intersection, your total runtime would be
Now then, copying an array is an O(n) operation, and in the worst case where your two arrays are the same, the ith intersection has length i+1. Therefore, your worse case run-time is
That being said, you can beat your naive algorithm's run-time by a constant factor by modifying the O(n) greedy algorithm for generating the intersection of two lists.
def intersection(a: list,b: list):
''' Computes the interection of lists a and b. Assumes that a and b have the same
length'''
a_leftover = set()
b_leftover = set()
stored = set() # We only want unique values
n = len(a)
result = []
for i in range(n):
a_elm = a[i]
b_elm = b[i]
if a_elm not in stored and a_elm == b_elm:
result.append(a_elm)
stored.add(a_elm)
else:
if a_elm not in stored:
if a_elm in b_leftover:
# a_elm was previously encountered in b
result.append(a_elm)
stored.add(a_elm)
b_leftover.remove(a_elm)
else:
a_leftover.add(a_elm)
if b_elm not in stored:
if b_elm in a_leftover:
# b_elf was previously encountered in a
result.append(b_elm)
stored.add(b_elm)
a_leftover.remove(b_elm)
else:
b_leftover.add(b_elm)
return result
All you would need to do is modify the algorithm to store and return the intermediate results.
def cumulative_intersection(a: list,b: list):
''' Computes the cumulative interection of lists a and b. Assumes that a and b have the same
length'''
a_leftover = set()
b_leftover = set()
stored = set() # We only want unique values
n = len(a)
result = []
results = []
for i in range(n):
a_elm = a[i]
b_elm = b[i]
if a_elm not in stored and a_elm == b_elm:
result.append(a_elm)
stored.add(a_elm)
else:
if a_elm not in stored:
if a_elm in b_leftover:
# a_elm was previously encountered in b
result.append(a_elm)
stored.add(a_elm)
b_leftover.remove(a_elm)
else:
a_leftover.add(a_elm)
if b_elm not in stored:
if b_elm in a_leftover:
# b_elf was previously encountered in a
result.append(b_elm)
stored.add(b_elm)
a_leftover.remove(b_elm)
else:
b_leftover.add(b_elm)
results.append(result[:])
return results
Upvotes: 2