Reputation: 11
I'm trying to write a code in python that can remove overlapping lists. My lists are in the format [lower_bound,upper_bound,data,data]
. These lists are stored in a larger list. I want to iterate over the larger list, removing any of these smaller lists where the bounds overlap. This is what I have.
def cut_overlapping_ranges(ranges):
keep_ranges = []
temp_ranges = ranges
for range_1 in temp_ranges:
keep_ranges.append(range_1)
for range_2 in temp_ranges:
if range_1 != range_2:
if ((range_1[0] < range_2[0] < range_1[1]) or (range_1[0] < range_2[1] < range_1[1])):
temp_ranges.remove(range_2)
return
Upvotes: 0
Views: 124
Reputation: 6633
Here was my guess about the problem. I don't know if two ranges overlap, you want to remove them both or just the second one. My solution removes the second one.
L = [[0, 10], [5, 15], [11, 18], [10, 25], [20, 25], [30, 35]]
L.sort(key= lambda x: x[0])
sets = []
delete_idx = set()
for sub_list in L:
sets.append(set(range(sub_list[0], sub_list[1]+1)))
for i, set_1 in enumerate(sets[:-1:]):
if i in delete_idx:
continue
for j, set_2 in enumerate(sets[i+1::], start = i+1):
if len(set_1 & set_2) > 0:
delete_idx.add(j)
# remove from the greatest idx to smallest idx
for i in sorted(delete_idx, reverse=True):
L.remove(L[i])
print(L)
It printed:
[[0, 10], [11, 18], [20, 25], [30, 35]]
Upvotes: 0
Reputation: 11
This is the solution I found, thanks to help on loop mechanics from juanpa.arrivillaga. I imagine it's not too efficient, for that, look to other answers; however, this works without excess libraries.
def cut_overlapping_ranges(ranges):
keep_ranges = []
temp_ranges = ranges.copy()
while True:
range_position = 0
if len(temp_ranges) == 0:
break
range_1 = temp_ranges[0]
lb = range_1[0]
ub = range_1[1]
keep_ranges.append(range_1)
temp_ranges.remove(range_1)
while True:
if len(temp_ranges) == 0:
break
try:
range_2 = temp_ranges[range_position]
if ((lb < range_2[0] < ub) or (lb < range_2[1] < ub)):
temp_ranges.remove(range_2)
else:
range_position += 1
except IndexError:
break
print(range)
return
input is
[[0.0023, 0.0033],[0.0025, 0.0035],[0.0029, 0.0039],[0.0022, 0.0032],[0.0017, 0.0027],[0.0031, 0.0041],[0.0032, 0.0042],[-0.0005, 0.0005],[0.0014, 0.0024],[-0.0002, 0.0008],[-0.0006,0.0004],[0.0012, 0.0022],[0.0013, 0.0023],[0.0008, 0.0018],[0.0011, 0.0021]]
output is
[[0.0023, 0.0033],[-0.0005, 0.0005],[0.0012, 0.0022]]
Upvotes: 0
Reputation: 107095
One efficient approach is to build an interval tree where all nodes to the left of a node have intervals lower than or equal to the lower bound of the node, and all nodes to the right of the node have intervals greater than or equal to the upper bound of the node so that adding an interval to the tree has an average time complexity of O(log n), and construction of the tree overall takes O(n log n) on average.
Since the goal is to retain non-overlapping intervals, we can make an insertion raise an exception to reject an overlapping interval:
class Interval:
def __init__(self, lower_bound, upper_bound):
self.lower_bound = lower_bound
self.upper_bound = upper_bound
class NonOverlappingIntervalTree:
def __init__(self):
self.interval = None
self.left = None
self.right = None
def add_interval(self, lower_bound, upper_bound):
if not self.interval:
self.interval = Interval(lower_bound, upper_bound)
return
if upper_bound <= self.interval.lower_bound:
if not (node := self.left):
node = self.left = NonOverlappingIntervalTree()
elif lower_bound >= self.interval.upper_bound:
if not (node := self.right):
node = self.right = NonOverlappingIntervalTree()
else:
raise ValueError
node.add_interval(lower_bound, upper_bound)
so your filter function can just skip an interval when it is found to overlap an existing node in the interval tree when attempting to add it to the tree:
def cut_overlapping_ranges(ranges):
output = []
tree = NonOverlappingIntervalTree()
for lower_bound, upper_bound, *data in ranges:
try:
tree.add_interval(lower_bound, upper_bound)
except ValueError:
continue
output.append([lower_bound, upper_bound, *data])
return output
so that:
cut_overlapping_ranges([[0, 10], [5, 15], [11, 18], [10, 25], [20, 25], [30, 35]])
outputs:
[[0, 10], [11, 18], [20, 25], [30, 35]]
Demo: https://ideone.com/vv48Oh
Upvotes: 2