tomasyany
tomasyany

Reputation: 1242

Looking for corresponding intervals for points

I have 5 tuples A, B, C, D and E representing intervals. Their intersection is empty (for each pair of them) and they are sort such as the upper limit of one interval is smaller than the lower limit of the following interval.

For example:

A = (5, 10)
B = (21, 29)
C = (134, 160)
D = (900, 1050)
E = (1080, 1100)

intervals = [A, B, C, D, E]

I also have a list X of points sorted in increasing order.

For example:

X = [6, 28, 130, 1000, 1129]

As you can see, each one of the numbers in X can or cannot belong to one interval. Since the intervals intersection is empty, each number can belong to at most one interval.
Besides, by construction, only one number per interval.

I'm trying to know to which interval does every number in X belongs, if any.
So for my example, the ouput should be:

output = [(6, A), (28, B), (None, C), (1000, D), (None, E)]

which means that the numbers 6, 28, 1000 belong to intervals A, B, D respectively, and no number belongs to intervals C and E.

In order to find to which interval does every number in X belongs to, I did the following:

output = []
for interval in intervals:
    for number in X:
        if interval[0] <= number and number <= interval[1]:
            found_interval = True
            output.append((number, interval))
            break

    if not found_interval:
        output.append((None, interval))

This should work, but I thought that there should be a faster way. I would like to avoid having to loop over the X for each interval. An upgraded solution would loop over the remaining numbers who hadn't found any interval.

Is there a faster way of doing this?

Upvotes: 3

Views: 669

Answers (2)

benavente
benavente

Reputation: 1264

You can look-up the intersections for this problem in linear time:

  • Create two variables to track the current value and interval indices.
  • As long as you haven't gone over every element in either list, check if the current value belongs to the interval.
  • If the value belongs to the current interval, increase by one both the current value and current interval indices. Since both the values and the intervals are sorted and the intervals are non-overlapping, no other element can belong to the current interval, so we can safely move along.
  • If the value is less than the current interval's lower bound, increase the current value index, as the value cannot belong to any interval with a larger lower bound either.
  • If the value is greater than the current interval's upper bound, increase the current interval index, as any other value would be greater than this bound as well.

    def find_intersection(values, intervals):
        output = []
        value_index = 0
        interval_index = 0
    
        while value index < len(values) and interval_index < len(intervals):
            current_value = values[value_index]
            current_interval = intervals[interval_index]
            lower_bound, upper_bound = current_interval
    
            if current_value < lower_bound:
                output.append((None, current_value))
                # This value cannot belong to any greater interval.
                value_index += 1
            elif current_value > upper_bound:
                # No other value can belong to this interval either.
                interval_index += 1
            else:
                output.append((current_interval, current_value))
                # At most one value per interval and one interval per value.
                value_index += 1
                interval_index += 1
    
        # If we ran out of intervals all remaining values do not belong to any.
        for v in values[value_index:]:
            output.append((None, v))
    
        return output
    

Worst case scenario, no number belongs to any interval and we have to iterate each list completely (intervals in the while loop and values in the for loop), so the complexity is O(n + m), where n is the number of values and m that of intervals.

Upvotes: 2

John Coleman
John Coleman

Reputation: 51998

[On Edit: my original code didn't handle the case where x is one of the right endpoints correctly. The revised code fixes that, with the test examples extended to show how it handles other edge cases]

Maybe this will help: Create a list of all endpoints in sorted order like

ends = [5,10,21,29,134,160,900,1050,1080,1100]

and use the bisect module to find where a point, x, lies in this list. This is a binary search hence is more efficient than your linear search. If it falls between two indices (i-1,i) where i is odd then x is in the corresponding interval. Otherwise it is in none.

Also, it is easy enough to use your list of tuples intervals to load up the sorted list of endpoints:

from bisect import bisect

def place(x,endpoints):
    i = bisect(endpoints,x)
    if i%2 == 0:
        if x == endpoints[i-1]:
            return endpoints[i-1],endpoints[i]
        else:
            return None
    else:
        return endpoints[i-1],endpoints[i]

A = (5, 10)
B = (21, 29)
C = (134, 160)
D = (900, 1050)
E = (1080, 1100)

intervals = [A, B, C, D, E]

ends = []
for interval in intervals:
    ends.extend(interval)

xs = [3, 5, 6, 10, 28, 130, 1000, 1129]

for x in xs:
    print(str(x),':',str(place(x,ends)))

Output:

3 : None
5 : (5, 10)
6 : (5, 10)
10 : (10, 21)
28 : (21, 29)
130 : None
1000 : (900, 1050)
1129 : None

Upvotes: 2

Related Questions