syko
syko

Reputation: 3637

What is the fastest algorithm for intersection of two sorted lists?

Say that there are two sorted lists: A and B.

The number of entries in A and B can vary. (They can be very small/huge. They can be similar to each other/significantly different).

What is the known to be the fastest algorithm for this functionality?

Can any one give me an idea or reference?

Upvotes: 9

Views: 4345

Answers (3)

Ismael EL ATIFI
Ismael EL ATIFI

Reputation: 2118

Here is a simple and tested Python implementation that uses bisect search to advance pointers of both lists.
It assumes both input lists are sorted and contain no duplicates.

import bisect

def compute_intersection_list(l1, l2):
    # A is the smaller list
    A, B = (l1, l2) if len(l1) < len(l2) else (l2, l1)
    i = 0
    j = 0
    intersection_list = []
    while i < len(A) and j < len(B):
        if A[i] == B[j]:
            intersection_list.append(A[i])
            i += 1
            j += 1
        elif A[i] < B[j]:
            i = bisect.bisect_left(A, B[j], lo=i+1)
        else:
            j = bisect.bisect_left(B, A[i], lo=j+1)
    return intersection_list


# test on many random cases
import random

MM = 100  # max value

for _ in range(10000):
    M1 = random.randint(0, MM)  # random max value
    N1 = random.randint(0, M1)  # random number of values
    M2 = random.randint(0, MM)  # random max value
    N2 = random.randint(0, M2)  # random number of values
    a = sorted(random.sample(range(M1), N1))  # sampling without replacement to have no duplicates
    b = sorted(random.sample(range(M2), N2))
    assert compute_intersection_list(a, b) == sorted(set(a).intersection(b))

Upvotes: 2

David Eisenstat
David Eisenstat

Reputation: 65506

Assume that A has m elements and B has n elements, with m ≥ n. Information theoretically, the best we can do is

   (m + n)!
lg -------- = n lg (m/n) + O(n)
    m!  n!

comparisons, since in order to verify an empty intersection, we essentially have to perform a sorted merge. We can get within a constant factor of this bound by iterating through B and keeping a "cursor" in A indicating the position at which the most recent element of B should be inserted to maintain sorted order. We use exponential search to advance the cursor, for a total cost that is on the order of

lg x_1 + lg x_2 + ... + lg x_n,

where x_1 + x_2 + ... + x_n = m + n is some integer partition of m. This sum is O(n lg (m/n)) by the concavity of lg.

Upvotes: 6

Keiwan
Keiwan

Reputation: 8301

I don't know if this is the fastest option but here's one that runs in O(n+m) where n and m are the sizes of your lists:

  • Loop over both lists until one of them is empty in the following way:
  • Advance by one on one list.
  • Advance on the other list until you find a value that is either equal or greater than the current value of the other list.
  • If it is equal, the element belongs to the intersection and you can append it to another list
  • If it is greater that the other element, advance on the other list until you find a value equal or greater than this value
  • as said, repeat this until one of the lists is empty

Upvotes: 3

Related Questions