Reputation: 3637

What is the fastest algorithm for intersection of two sorted lists?

Say that there are two sorted lists: A and B.

The number of entries in A and B can vary. (They can be very small/huge. They can be similar to each other/significantly different).

What is the known to be the fastest algorithm for this functionality?

Can any one give me an idea or reference?

Upvotes: 9

Answers (3)

Ismael EL ATIFI

Reputation: 2118

Here is a simple and tested Python implementation that uses bisect search to advance pointers of both lists.
It assumes both input lists are sorted and contain no duplicates.

import bisect

def compute_intersection_list(l1, l2):
    # A is the smaller list
    A, B = (l1, l2) if len(l1) < len(l2) else (l2, l1)
    i = 0
    j = 0
    intersection_list = []
    while i < len(A) and j < len(B):
        if A[i] == B[j]:
            intersection_list.append(A[i])
            i += 1
            j += 1
        elif A[i] < B[j]:
            i = bisect.bisect_left(A, B[j], lo=i+1)
        else:
            j = bisect.bisect_left(B, A[i], lo=j+1)
    return intersection_list


# test on many random cases
import random

MM = 100  # max value

for _ in range(10000):
    M1 = random.randint(0, MM)  # random max value
    N1 = random.randint(0, M1)  # random number of values
    M2 = random.randint(0, MM)  # random max value
    N2 = random.randint(0, M2)  # random number of values
    a = sorted(random.sample(range(M1), N1))  # sampling without replacement to have no duplicates
    b = sorted(random.sample(range(M2), N2))
    assert compute_intersection_list(a, b) == sorted(set(a).intersection(b))

Upvotes: 2

David Eisenstat

Reputation: 65506

Assume that A has m elements and B has n elements, with m ≥ n. Information theoretically, the best we can do is

   (m + n)!
lg -------- = n lg (m/n) + O(n)
    m!  n!

comparisons, since in order to verify an empty intersection, we essentially have to perform a sorted merge. We can get within a constant factor of this bound by iterating through B and keeping a "cursor" in A indicating the position at which the most recent element of B should be inserted to maintain sorted order. We use exponential search to advance the cursor, for a total cost that is on the order of

lg x_1 + lg x_2 + ... + lg x_n,

where x_1 + x_2 + ... + x_n = m + n is some integer partition of m. This sum is O(n lg (m/n)) by the concavity of lg.

Upvotes: 6

Keiwan

Reputation: 8301

I don't know if this is the fastest option but here's one that runs in O(n+m) where n and m are the sizes of your lists:

Loop over both lists until one of them is empty in the following way:
Advance by one on one list.
Advance on the other list until you find a value that is either equal or greater than the current value of the other list.
If it is equal, the element belongs to the intersection and you can append it to another list
If it is greater that the other element, advance on the other list until you find a value equal or greater than this value
as said, repeat this until one of the lists is empty

Upvotes: 3

What is the fastest algorithm for intersection of two sorted lists?

Answers (3)

Related Questions