Reputation: 11

Trying to delete overlapping nested lists in Python

I'm trying to write a code in python that can remove overlapping lists. My lists are in the format [lower_bound,upper_bound,data,data]. These lists are stored in a larger list. I want to iterate over the larger list, removing any of these smaller lists where the bounds overlap. This is what I have.

def cut_overlapping_ranges(ranges):

  keep_ranges = []
  temp_ranges = ranges

  for range_1 in temp_ranges:
    keep_ranges.append(range_1)
    for range_2 in temp_ranges:
      if range_1 != range_2:
        if ((range_1[0] < range_2[0] < range_1[1]) or (range_1[0] < range_2[1] < range_1[1])):
          temp_ranges.remove(range_2)
  return

Upvotes: 0

Answers (3)

Chris Charley

Reputation: 6633

Here was my guess about the problem. I don't know if two ranges overlap, you want to remove them both or just the second one. My solution removes the second one.

L = [[0, 10], [5, 15], [11, 18], [10, 25], [20, 25], [30, 35]]
L.sort(key= lambda x: x[0])

sets = []
delete_idx = set()

for sub_list in L:
    sets.append(set(range(sub_list[0], sub_list[1]+1)))

for i, set_1 in enumerate(sets[:-1:]):
    if i in delete_idx:
        continue
    for j, set_2 in enumerate(sets[i+1::], start = i+1):
        if len(set_1 & set_2) > 0:
            delete_idx.add(j)

# remove from the greatest idx to smallest idx
for i in sorted(delete_idx, reverse=True):
    L.remove(L[i])

print(L)

It printed:

[[0, 10], [11, 18], [20, 25], [30, 35]]

Upvotes: 0

Feynman Cox

Reputation: 11

This is the solution I found, thanks to help on loop mechanics from juanpa.arrivillaga. I imagine it's not too efficient, for that, look to other answers; however, this works without excess libraries.

def cut_overlapping_ranges(ranges):

  keep_ranges = []
  temp_ranges = ranges.copy()

  while True:
    range_position = 0
    if len(temp_ranges) == 0:
      break
    range_1 = temp_ranges[0]
    lb = range_1[0]
    ub = range_1[1]
    keep_ranges.append(range_1)
    temp_ranges.remove(range_1)
    while True:
      if len(temp_ranges) == 0:
        break
      try:
        range_2 = temp_ranges[range_position]
        if ((lb < range_2[0] < ub) or (lb < range_2[1] < ub)):
            temp_ranges.remove(range_2)
        else:
          range_position += 1
      except IndexError:
        break

  print(range)

  return

input is

[[0.0023, 0.0033],[0.0025, 0.0035],[0.0029, 0.0039],[0.0022, 0.0032],[0.0017, 0.0027],[0.0031, 0.0041],[0.0032, 0.0042],[-0.0005, 0.0005],[0.0014, 0.0024],[-0.0002, 0.0008],[-0.0006,0.0004],[0.0012, 0.0022],[0.0013, 0.0023],[0.0008, 0.0018],[0.0011, 0.0021]]

output is

[[0.0023, 0.0033],[-0.0005, 0.0005],[0.0012, 0.0022]]

Upvotes: 0

blhsing

Reputation: 107095

One efficient approach is to build an interval tree where all nodes to the left of a node have intervals lower than or equal to the lower bound of the node, and all nodes to the right of the node have intervals greater than or equal to the upper bound of the node so that adding an interval to the tree has an average time complexity of O(log n), and construction of the tree overall takes O(n log n) on average.

Since the goal is to retain non-overlapping intervals, we can make an insertion raise an exception to reject an overlapping interval:

class Interval:
    def __init__(self, lower_bound, upper_bound):
        self.lower_bound = lower_bound
        self.upper_bound = upper_bound

class NonOverlappingIntervalTree:
    def __init__(self):
        self.interval = None
        self.left = None
        self.right = None

    def add_interval(self, lower_bound, upper_bound):
        if not self.interval:
            self.interval = Interval(lower_bound, upper_bound)
            return
        if upper_bound <= self.interval.lower_bound:
            if not (node := self.left):
                node = self.left = NonOverlappingIntervalTree()
        elif lower_bound >= self.interval.upper_bound:
            if not (node := self.right):
                node = self.right = NonOverlappingIntervalTree()
        else:
            raise ValueError
        node.add_interval(lower_bound, upper_bound)

so your filter function can just skip an interval when it is found to overlap an existing node in the interval tree when attempting to add it to the tree:

def cut_overlapping_ranges(ranges):
    output = []
    tree = NonOverlappingIntervalTree()
    for lower_bound, upper_bound, *data in ranges:
        try:
            tree.add_interval(lower_bound, upper_bound)
        except ValueError:
            continue
        output.append([lower_bound, upper_bound, *data])
    return output

so that:

cut_overlapping_ranges([[0, 10], [5, 15], [11, 18], [10, 25], [20, 25], [30, 35]])

outputs:

[[0, 10], [11, 18], [20, 25], [30, 35]]

Demo: https://ideone.com/vv48Oh

Upvotes: 2

Trying to delete overlapping nested lists in Python

Answers (3)

Related Questions