Reputation: 1242
I have 5 tuples A, B, C, D
and E
representing intervals. Their intersection is empty (for each pair of them) and they are sort such as the upper limit of one interval is smaller than the lower limit of the following interval.
For example:
A = (5, 10)
B = (21, 29)
C = (134, 160)
D = (900, 1050)
E = (1080, 1100)
intervals = [A, B, C, D, E]
I also have a list X
of points sorted in increasing order.
For example:
X = [6, 28, 130, 1000, 1129]
As you can see, each one of the numbers in X
can or cannot belong to one interval. Since the intervals intersection is empty, each number can belong to at most one interval.
Besides, by construction, only one number per interval.
I'm trying to know to which interval does every number in X
belongs, if any.
So for my example, the ouput should be:
output = [(6, A), (28, B), (None, C), (1000, D), (None, E)]
which means that the numbers 6, 28, 1000
belong to intervals A, B, D
respectively, and no number belongs to intervals C
and E
.
In order to find to which interval does every number in X
belongs to, I did the following:
output = []
for interval in intervals:
for number in X:
if interval[0] <= number and number <= interval[1]:
found_interval = True
output.append((number, interval))
break
if not found_interval:
output.append((None, interval))
This should work, but I thought that there should be a faster way. I would like to avoid having to loop over the X
for each interval. An upgraded solution would loop over the remaining numbers who hadn't found any interval.
Is there a faster way of doing this?
Upvotes: 3
Views: 669
Reputation: 1264
You can look-up the intersections for this problem in linear time:
If the value is greater than the current interval's upper bound, increase the current interval index, as any other value would be greater than this bound as well.
def find_intersection(values, intervals):
output = []
value_index = 0
interval_index = 0
while value index < len(values) and interval_index < len(intervals):
current_value = values[value_index]
current_interval = intervals[interval_index]
lower_bound, upper_bound = current_interval
if current_value < lower_bound:
output.append((None, current_value))
# This value cannot belong to any greater interval.
value_index += 1
elif current_value > upper_bound:
# No other value can belong to this interval either.
interval_index += 1
else:
output.append((current_interval, current_value))
# At most one value per interval and one interval per value.
value_index += 1
interval_index += 1
# If we ran out of intervals all remaining values do not belong to any.
for v in values[value_index:]:
output.append((None, v))
return output
Worst case scenario, no number belongs to any interval and we have to iterate each list completely (intervals
in the while loop and values
in the for loop), so the complexity is O(n + m)
, where n
is the number of values and m
that of intervals.
Upvotes: 2
Reputation: 51998
[On Edit: my original code didn't handle the case where x
is one of the right endpoints correctly. The revised code fixes that, with the test examples extended to show how it handles other edge cases]
Maybe this will help: Create a list of all endpoints in sorted order like
ends = [5,10,21,29,134,160,900,1050,1080,1100]
and use the bisect
module to find where a point, x, lies in this list. This is a binary search hence is more efficient than your linear search. If it falls between two indices (i-1,i) where i is odd then x is in the corresponding interval. Otherwise it is in none.
Also, it is easy enough to use your list of tuples intervals
to load up the sorted list of endpoints:
from bisect import bisect
def place(x,endpoints):
i = bisect(endpoints,x)
if i%2 == 0:
if x == endpoints[i-1]:
return endpoints[i-1],endpoints[i]
else:
return None
else:
return endpoints[i-1],endpoints[i]
A = (5, 10)
B = (21, 29)
C = (134, 160)
D = (900, 1050)
E = (1080, 1100)
intervals = [A, B, C, D, E]
ends = []
for interval in intervals:
ends.extend(interval)
xs = [3, 5, 6, 10, 28, 130, 1000, 1129]
for x in xs:
print(str(x),':',str(place(x,ends)))
Output:
3 : None
5 : (5, 10)
6 : (5, 10)
10 : (10, 21)
28 : (21, 29)
130 : None
1000 : (900, 1050)
1129 : None
Upvotes: 2