Looking for corresponding intervals for points

Question

I have 5 tuples A, B, C, D and E representing intervals. Their intersection is empty (for each pair of them) and they are sort such as the upper limit of one interval is smaller than the lower limit of the following interval.

For example:

A = (5, 10)
B = (21, 29)
C = (134, 160)
D = (900, 1050)
E = (1080, 1100)

intervals = [A, B, C, D, E]

I also have a list X of points sorted in increasing order.

For example:

X = [6, 28, 130, 1000, 1129]

As you can see, each one of the numbers in X can or cannot belong to one interval. Since the intervals intersection is empty, each number can belong to at most one interval.
Besides, by construction, only one number per interval.

I'm trying to know to which interval does every number in X belongs, if any.
So for my example, the ouput should be:

output = [(6, A), (28, B), (None, C), (1000, D), (None, E)]

which means that the numbers 6, 28, 1000 belong to intervals A, B, D respectively, and no number belongs to intervals C and E.

In order to find to which interval does every number in X belongs to, I did the following:

output = []
for interval in intervals:
    for number in X:
        if interval[0] <= number and number <= interval[1]:
            found_interval = True
            output.append((number, interval))
            break

    if not found_interval:
        output.append((None, interval))

This should work, but I thought that there should be a faster way. I would like to avoid having to loop over the X for each interval. An upgraded solution would loop over the remaining numbers who hadn't found any interval.

Is there a faster way of doing this?

benavente · Accepted Answer

You can look-up the intersections for this problem in linear time:

Create two variables to track the current value and interval indices.
As long as you haven't gone over every element in either list, check if the current value belongs to the interval.
If the value belongs to the current interval, increase by one both the current value and current interval indices. Since both the values and the intervals are sorted and the intervals are non-overlapping, no other element can belong to the current interval, so we can safely move along.
If the value is less than the current interval's lower bound, increase the current value index, as the value cannot belong to any interval with a larger lower bound either.

If the value is greater than the current interval's upper bound, increase the current interval index, as any other value would be greater than this bound as well.

def find_intersection(values, intervals):
    output = []
    value_index = 0
    interval_index = 0

    while value index < len(values) and interval_index < len(intervals):
        current_value = values[value_index]
        current_interval = intervals[interval_index]
        lower_bound, upper_bound = current_interval

        if current_value < lower_bound:
            output.append((None, current_value))
            # This value cannot belong to any greater interval.
            value_index += 1
        elif current_value > upper_bound:
            # No other value can belong to this interval either.
            interval_index += 1
        else:
            output.append((current_interval, current_value))
            # At most one value per interval and one interval per value.
            value_index += 1
            interval_index += 1

    # If we ran out of intervals all remaining values do not belong to any.
    for v in values[value_index:]:
        output.append((None, v))

    return output

Worst case scenario, no number belongs to any interval and we have to iterate each list completely (intervals in the while loop and values in the for loop), so the complexity is O(n + m), where n is the number of values and m that of intervals.

Looking for corresponding intervals for points

Answers (2)

Related Questions