Reputation: 143

Intersection of two lists of ranges in Python

A friend of mine passed me over an interview question he recently got and I wasn't very happy with my approach to the solution. The question is as follows:

You have two lists.
Each list will contain lists of length 2, which represent a range (ie. [3,5] means a range from 3 to 5, inclusive).
You need to return the intersection of all ranges between the sets. If I give you [1,5] and [0,2], the result would be [1,2].
Within each list, the ranges will always increase and never overlap (i.e. it will be [[0, 2], [5, 10] ... ] never [[0,2], [2,5] ... ])

In general there are no "gotchas" in terms of the ordering or overlapping of the lists.

Example:

a = [[0, 2], [5, 10], [13, 23], [24, 25]]
b = [[1, 5], [8, 12], [15, 18], [20, 24]]

Expected output: [[1, 2], [5, 5], [8, 10], [15, 18], [20, 24]]

My lazy solution involved spreading the list of ranges into a list of integers then doing a set intersection, like this:

def get_intersection(x, y):
    x_spread = [item for sublist in [list(range(l[0],l[1]+1)) for l in x] for item in sublist]
    y_spread = [item for sublist in [list(range(l[0],l[1]+1)) for l in y] for item in sublist]
    flat_intersect_list = list(set(x_spread).intersection(y_spread))
...

But I imagine there's a solution that's both readable and more efficient.

Please explain how you would mentally tackle this problem, if you don't mind. A time/space complexity analysis would also be helpful.

Thanks

Upvotes: 12

Answers (6)

Guybrush

Reputation: 2790

I know this question already got a correct answer. For completeness, I would like to mention I developed some time ago a Python library, namely portion (https://github.com/AlexandreDecan/portion) that supports this kind of operations (intersections between list of atomic intervals).

You can have a look at the implementation, it's quite close to some of the answers that were provided here: https://github.com/AlexandreDecan/portion/blob/master/portion/interval.py#L406

To illustrate its usage, let's consider your example:

a = [[0, 2], [5, 10], [13, 23], [24, 25]]
b = [[1, 5], [8, 12], [15, 18], [20, 24]]

We need to convert these "items" to closed (atomic) intervals first:

import portion as P

a = [P.closed(x, y) for x, y in a]
b = [P.closed(x, y) for x, y in b]

print(a)

... displays [[0,2], [5,10], [13,23], [24,25]] (each [x,y] is an Interval object).

Then we can create an interval that represents the union of these atomic intervals:

a = P.Interval(*a)
b = P.Interval(*b)

print(b)

... displays [0,2] | [5,10] | [13,23] | [24,25] (a single Interval object, representing the union of all the atomic ones).

And now we can easily compute the intersection:

c = a & b
print(c)

... displays [1,2] | [5] | [8,10] | [15,18] | [20,23] | [24].

Notice that our answer differs from yours ([20,23] | [24] instead of [20,24]) since the library expects continuous domains for values. We can quite easily convert the results to discrete intervals following the approach proposed in https://github.com/AlexandreDecan/portion/issues/24#issuecomment-604456362 as follows:

def discretize(i, incr=1):
  first_step = lambda s: (P.OPEN, (s.lower - incr if s.left is P.CLOSED else s.lower), (s.upper + incr if s.right is P.CLOSED else s.upper), P.OPEN)
  second_step = lambda s: (P.CLOSED, (s.lower + incr if s.left is P.OPEN and s.lower != -P.inf else s.lower), (s.upper - incr if s.right is P.OPEN and s.upper != P.inf else s.upper), P.CLOSED)
  return i.apply(first_step).apply(second_step)

print(discretize(c))

... displays [1,2] | [5] | [8,10] | [15,18] | [20,24].

Upvotes: 3

user2846495

Reputation: 189

Algorithm

Given two intervals, if they overlap, then the intersection's starting point is the maximum of the starting points of the two intervals, and its stopping point is the minimum of the stopping points:

To find all the pairs of intervals that might intersect, start with the first pair and keep incrementing the interval with the lower stopping point:

At most m + n pairs of intervals are considered, where m is length of the first list, and n is the length of the second list. Calculating the intersection of a pair of intervals is done in constant time, so this algorithm's time-complexity is O(m+n).

Implementation

To keep the code simple, I'm using Python's built-in range object for the intervals. This is a slight deviation from the problem description in that ranges are half-open intervals rather than closed. That is,

(x in range(a, b)) == (a <= x < b)

Given two range objects x and y, their intersection is range(start, stop), where start = max(x.start, y.start) and stop = min(x.stop, y.stop). If the two ranges don't overlap, then start >= stop and you just get an empty range:

>>> len(range(1, 0))
0

So given two lists of ranges, xs and ys, each increasing in start value, the intersection can be computed as follows:

def intersect_ranges(xs, ys):

    # Merge any abutting ranges (implementation below):
    xs, ys = merge_ranges(xs), merge_ranges(ys)

    # Try to get the first range in each iterator:
    try:
        x, y = next(xs), next(ys)
    except StopIteration:
        return

    while True:
        # Yield the intersection of the two ranges, if it's not empty:
        intersection = range(
            max(x.start, y.start),
            min(x.stop, y.stop)
        )
        if intersection:
            yield intersection

        # Try to increment the range with the earlier stopping value:
        try:
            if x.stop <= y.stop:
                x = next(xs)
            else:
                y = next(ys)
        except StopIteration:
            return

It seems from your example that the ranges can abut. So any abutting ranges have to be merged first:

def merge_ranges(xs):
    start, stop = None, None
    for x in xs:
        if stop is None:
            start, stop = x.start, x.stop
        elif stop < x.start:
            yield range(start, stop)
            start, stop = x.start, x.stop
        else:
            stop = x.stop
    yield range(start, stop)

Applying this to your example:

>>> a = [[0, 2], [5, 10], [13, 23], [24, 25]]
>>> b = [[1, 5], [8, 12], [15, 18], [20, 24]]
>>> list(intersect_ranges(
...     (range(i, j+1) for (i, j) in a),
...     (range(i, j+1) for (i, j) in b)
... ))
[range(1, 3), range(5, 6), range(8, 11), range(15, 19), range(20, 25)]

Upvotes: 9

shannon

Reputation: 8784

Answering your question as I personally would probably answer an interview question and probably also most appreciate an answer; the interviewee's goal is probably to demonstrate a range of skills, not limited strictly to python. So this answer is admittedly going to be more abstract than others here.

It might be helpful to ask for information about any constraints I'm operating under. Operation time and space complexity are common constraints, as is development time, all of which are mentioned in previous answers here; but other constraints might also arise. As common as any of those is maintenance and integration with existing code.

Within each list, the ranges will always increase and never overlap

When I see this, it probably means there is some pre-existing code to normalize the list of ranges, that sorts ranges and merges overlap. That's a pretty common union operation. When joining an existing team or ongoing project, one of the most important factors for success is integrating with existing patterns.

Intersection operation can also be performed via a union operation. Invert the sorted ranges, union them, and invert the result.

To me, that answer demonstrates experience with algorithms generally and "range" problems specifically, an appreciation that the most readable and maintainable code approach is typically reusing existing code, and a desire to help a team succeed over simply puzzling on my own.

Another approach is to sort both lists together into one iterable list. Iterate the list, reference counting each start/end as increment/decrement steps. Ranges are emitted on transitions between reference counts of 1 and 2. This approach is inherently extensible to support more than two lists, if the sort operation meets our needs (and they usually do).

Unless instructed otherwise, I would offer the general approaches and discuss reasons I might use each before writing code.

So, there's no code here. But you did ask for general approaches and thinking :D

Upvotes: 0

BallpointBen

Reputation: 13989

OP, I believe this solution works, and it runs in O(m+n) time where m and n are the lengths of the lists. (To be sure, make ranges a linked list so that changing its length runs in constant time.)

def intersections(a,b):
    ranges = []
    i = j = 0
    while i < len(a) and j < len(b):
        a_left, a_right = a[i]
        b_left, b_right = b[j]

        if a_right < b_right:
            i += 1
        else:
            j += 1

        if a_right >= b_left and b_right >= a_left:
            end_pts = sorted([a_left, a_right, b_left, b_right])
            middle = [end_pts[1], end_pts[2]]
            ranges.append(middle)

    ri = 0
    while ri < len(ranges)-1:
        if ranges[ri][1] == ranges[ri+1][0]:
            ranges[ri:ri+2] = [[ranges[ri][0], ranges[ri+1][1]]]

        ri += 1

    return ranges

a = [[0,2], [5,10], [13,23], [24,25]]
b = [[1,5], [8,12], [15,18], [20,24]]
print(intersects(a,b))
# [[1, 2], [5, 5], [8, 10], [15, 18], [20, 24]]

Upvotes: 9

TemporalWolf

Reputation: 7952

[[max(first[0], second[0]), min(first[1], second[1])] 
  for first in a for second in b 
  if max(first[0], second[0]) <= min(first[1], second[1])]

A list comprehension which gives the answer: [[1, 2], [5, 5], [8, 10], [15, 18], [20, 23], [24, 24]]

Breaking it down:

[[max(first[0], second[0]), min(first[1], second[1])]

Maximum of the first term, Min of the 2nd term

for first in a for second in b

For all combinations of first and second term:

if max(first[0], second[0]) <= min(first[1], second[1])]

Only if the max of the first does not exceed the minimum of the second.

If you need the output compacted, then the following function does that (In O(n^2) time because deletion from a list is O(n), a step we perform O(n) times):

def reverse_compact(lst):
    for index in range(len(lst) - 2,-1,-1):
        if lst[index][1] + 1 >= lst[index + 1][0]:
            lst[index][1] = lst[index + 1][1]
            del lst[index + 1]  # remove compacted entry O(n)*
    return lst

It joins ranges which touch, given they are in-order. It does it in reverse because then we can do this operation in place and delete the compacted entries as we go. If we didn't do it in reverse, deleting other entries would muck with our index.

>>> reverse_compact(comp)
[[1, 2], [5, 5], [8, 10], [15, 18], [20, 24]]

The compacting function can be reduced further to O(n) by doing a forward in place compaction and copying back the elements, as then each inner step is O(1) (get/set instead of del), but this is less readable:

This runs in O(n) time and space complexity:

def compact(lst):
    next_index = 0  # Keeps track of the last used index in our result
    for index in range(len(lst) - 1):
        if lst[next_index][1] + 1 >= lst[index + 1][0]:
            lst[next_index][1] = lst[index + 1][1]
        else:    
            next_index += 1
            lst[next_index] = lst[index + 1]
    return lst[:next_index + 1]

Using either compactor, the list comprehension is the dominating term here, with time =O(n*m), space = O(m+n), as it compares all possible combinations of the two lists with no early outs. This does not take advantage of the ordered structure of the lists given in the prompt: you could exploit that structure to reduce the time complexity to O(n + m) as they always increase and never overlap, meaning you can do all comparisons in a single pass.

Note there is more than one solution and hopefully you can solve the problem and then iteratively improve upon it.

A 100% correct answer which satisfies all possible inputs is not the goal of an interview question. It is to see how a person thinks and handles challenges, and whether they can reason about a solution.

In fact, if you give me a 100% correct, textbook answer, it's probably because you've seen the question before and you already know the solution... and therefore that question isn't helpful to me as an interviewer. 'Check, can regurgitate solutions found on StackOverflow.' The idea is to watch you solve a problem, not regurgitate a solution.

Too many candidates miss the forest for the trees: Acknowledging shortcomings and suggesting solutions is the right way to go about an answer to an interview questions. You don't have to have a solution, you have to show how you would approach the problem.

Your solution is fine if you can explain it and detail potential issues with using it.

I got my current job by failing to answer an interview question: After spending the majority of my time trying, I explained why my approach didn't work and the second approach I would try given more time, along with potential pitfalls I saw in that approach (and why I opted for my first strategy initially).

Upvotes: 13

Gene

Reputation: 47020

I'm no kind of python programmer, but don't think this problem is amenable to slick Python-esque short solutions that are also efficient.

Mine treats the interval boundaries as "events" labeled 1 and 2, processing them in order. Each event toggles the respective bit in a parity word. When we toggle to or from 3, it's time to emit the beginning or end of an intersection interval.

The tricky part is that e.g. [13, 23], [24, 25] is being treated as [13, 25]; adjacent intervals must be concatenated. The nested if below takes care of this case by continuing the current interval rather than starting a new one. Also, for equal event values, interval starts must be processed before ends so that e.g. [1, 5] and [5, 10] will be emitted as [5, 5] rather than nothing. That's handled with the middle field of the event tuples.

This implementation is O(n log n) due to the sorting, where n is the total length of both inputs. By merging the two event lists pairwise, it could be O(n), but this article suggests that the lists must be huge before the library merge will beat the library sort.

def get_isect(a, b):
  events = (map(lambda x: (x[0], 0, 1), a) + map(lambda x: (x[1], 1, 1), a)
          + map(lambda x: (x[0], 0, 2), b) + map(lambda x: (x[1], 1, 2), b))
  events.sort()
  prevParity = 0
  isect = []
  for event in events:
    parity = prevParity ^ event[2]
    if parity == 3:
      # Maybe start a new intersection interval.
      if len(isect) == 0 or isect[-1][1] < event[0] - 1:
        isect.append([event[0], 0])
    elif prevParity == 3:
      # End the current intersection interval.
      isect[-1][1] = event[0]
    prevParity = parity
  return isect

Here is an O(n) version that's a bit more complex because it finds the next event on the fly by merging the input lists. It also requires only constant storage beyond inputs and output:

def get_isect2(a, b):
  ia = ib = prevParity = 0
  isect = []
  while True:
    aVal = a[ia / 2][ia % 2] if ia < 2 * len(a) else None
    bVal = b[ib / 2][ib % 2] if ib < 2 * len(b) else None
    if not aVal and not bVal: break
    if not bVal or aVal < bVal or (aVal == bVal and ia % 2 == 0):
      parity = prevParity ^ 1
      val = aVal
      ia += 1
    else:
      parity = prevParity ^ 2
      val = bVal
      ib += 1
    if parity == 3:
      if len(isect) == 0 or isect[-1][1] < val - 1:
        isect.append([val, 0])
    elif prevParity == 3:
      isect[-1][1] = val
    prevParity = parity
  return isect

Upvotes: 0

Intersection of two lists of ranges in Python

Answers (6)

Algorithm

Implementation

Related Questions