user2338068
user2338068

Reputation: 193

Python - Removing overlapping lists

Say I have a list of lists that has indexes [[start, end], [start1, end1], [start2, end2]].

Like for example :

[[0, 133], [78, 100], [25, 30]].

How would I get check for overlap between among the lists and remove the list with the longer length each time? So:

>>> list = [[0, 133], [78, 100], [25, 30]]
>>> foo(list)
[[78, 100], [25, 30]]

This is what I tried to do so far:

def cleanup_list(list):
    i = 0
    c = 0
    x = list[:]
    end = len(x)
    while i < end-1:
        for n in range(x[i][0], x[i][1]):
            if n in range(x[i+1][0], x[i+1][1]):
                list.remove(max(x[i], x[i+1]))
        i +=1
    return list

But in addition to being kind of convoluted it's not working properly:

 >>>cleanup_list([[0,100],[9,10],[12,90]])
 [[0, 100], [12, 90]]

Any help would be appreciated!

EDIT:

Other examples would be:

>>>a = [[0, 100], [4, 20], [30, 35], [30, 78]]
>>>foo(a)
[[4, 20], [30, 35]]

>>>b = [[30, 70], [25, 40]]
>>>foo(b)
[[25, 40]]

I'm basically trying to remove all of the longest lists that overlap with another list. In this case I don't have to worry about the lists being of equal length.

Thanks!!

Upvotes: 14

Views: 5412

Answers (5)

AlessandroEmm
AlessandroEmm

Reputation: 698

This may not be the fastest solution, but really verbose and clear - I think.

a = [[2,100], [4,10], [77,99], [38,39], [44,80], [69,70], [88, 90]]

# build ranges first
def expand(list):
    newList = []
    for r in list:
        newList.append(range(r[0], r[1] + 1))
    return newList


def compare(list):
    toBeDeleted = []
    for index1 in range(len(list)):
        for index2 in range(len(list)):
            if index1 == index2:
                # we dont want to compare ourselfs
                continue
            matches = [x for x in list[index1] if x in list[index2]]
            if len(matches) != 0: # do we have overlap?
                ## compare lengths and get rid of the longer one
                if   len(list[index1]) > len(list[index2]):
                    toBeDeleted.append(index1)
                    break
                elif len(list[index1]) < len(list[index2]):
                    toBeDeleted.append(index2)
    # distinct
    toBeDeleted = [ toBeDeleted[i] for i,x in enumerate(toBeDeleted) if x not in toBeDeleted[i+1:]] 
    print len(list)
    # remove items
    for i in toBeDeleted[::-1]:
        del list[i] 
    return list


print(compare(expand(a)))

Upvotes: 3

jfs
jfs

Reputation: 414139

To remove a minimal number of intervals from the list such that the intervals that are left do not overlap, O(n*log n) algorithm exists:

def maximize_nonoverlapping_count(intervals):
    # sort by the end-point
    L = sorted(intervals, key=lambda (start, end): (end, (end - start)),
               reverse=True) # O(n*logn)
    iv = build_interval_tree(intervals) # O(n*log n)
    result = []
    while L: # until there are intervals left to consider
        # pop the interval with the smallest end-point, keep it in the result
        result.append(L.pop()) # O(1)
        # remove intervals that overlap with the popped interval
        overlapping_intervals = iv.pop(result[-1]) # O(log n + m)
        remove(overlapping_intervals, from_=L) 
    return result

It should produce the following results:

f = maximize_nonoverlapping_count
assert f([[0, 133], [78, 100], [25, 30]]) == [[25, 30], [78, 100]]
assert f([[0,100],[9,10],[12,90]]) == [[9,10], [12, 90]]
assert f([[0, 100], [4, 20], [30, 35], [30, 78]]) == [[4, 20], [30, 35]]
assert f([[30, 70], [25, 40]]) == [[25, 40]]

It requires the data structure that can find in O(log n + m) time all intervals that overlap with the given interval e.g., IntervalTree. There are implementations that can be used from Python e.g., quicksect.py, see Fast interval intersection methodologies for the example usage.


Here's a quicksect-based O(n**2) implementation of the above algorithm:

from quicksect import IntervalNode

class Interval(object):
    def __init__(self, start, end):
        self.start = start
        self.end = end
        self.removed = False

def maximize_nonoverlapping_count(intervals):
    intervals = [Interval(start, end) for start, end in intervals]
    # sort by the end-point
    intervals.sort(key=lambda x: (x.end, (x.end - x.start)))   # O(n*log n)
    tree = build_interval_tree(intervals) # O(n*log n)
    result = []
    for smallest in intervals: # O(n) (without the loop body)
        # pop the interval with the smallest end-point, keep it in the result
        if smallest.removed:
            continue # skip removed nodes
        smallest.removed = True
        result.append([smallest.start, smallest.end]) # O(1)

        # remove (mark) intervals that overlap with the popped interval
        tree.intersect(smallest.start, smallest.end, # O(log n + m)
                       lambda x: setattr(x.other, 'removed', True))
    return result

def build_interval_tree(intervals):
    root = IntervalNode(intervals[0].start, intervals[0].end,
                        other=intervals[0])
    return reduce(lambda tree, x: tree.insert(x.start, x.end, other=x),
                  intervals[1:], root)

Note: the time complexity in the worst case is O(n**2) for this implementation because the intervals are only marked as removed e.g., imagine such input intervals that len(result) == len(intervals) / 3 and there were len(intervals) / 2 intervals that span the whole range then tree.intersect() would be called n/3 times and each call would execute x.other.removed = True at least n/2 times i.e., n*n/6 operations in total:

n = 6
intervals = [[0, 100], [0, 100], [0, 100], [0, 10], [10, 20], [15, 40]])
result = [[0, 10], [10, 20]]

Here's a banyan-based O(n log n) implementation:

from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan

def maximize_nonoverlapping_count(intervals):
    # sort by the end-point O(n log n)
    sorted_intervals = SortedSet(intervals,
                                 key=lambda (start, end): (end, (end - start)))
    # build "interval" tree O(n log n)
    tree = SortedSet(intervals, updator=OverlappingIntervalsUpdator)
    result = []
    while sorted_intervals: # until there are intervals left to consider
        # pop the interval with the smallest end-point, keep it in the result
        result.append(sorted_intervals.pop()) # O(log n)

        # remove intervals that overlap with the popped interval
        overlapping_intervals = tree.overlap(result[-1]) # O(m log n)
        tree -= overlapping_intervals # O(m log n)
        sorted_intervals -= overlapping_intervals # O(m log n)
    return result

Note: this implementation considers [0, 10] and [10, 20] intervals to be overlapping:

f = maximize_nonoverlapping_count
assert f([[0, 100], [0, 10], [11, 20], [15, 40]]) == [[0, 10] ,[11, 20]]
assert f([[0, 100], [0, 10], [10, 20], [15, 40]]) == [[0, 10] ,[15, 40]]

sorted_intervals and tree can be merged:

from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan

def maximize_nonoverlapping_count(intervals):
    # build "interval" tree sorted by the end-point O(n log n)
    tree = SortedSet(intervals, key=lambda (start, end): (end, (end - start)),
                     updator=OverlappingIntervalsUpdator)
    result = []
    while tree: # until there are intervals left to consider
        # pop the interval with the smallest end-point, keep it in the result
        result.append(tree.pop()) # O(log n)

        # remove intervals that overlap with the popped interval
        overlapping_intervals = tree.overlap(result[-1]) # O(m log n)
        tree -= overlapping_intervals # O(m log n)
    return result

Upvotes: 11

Joel Cornett
Joel Cornett

Reputation: 24788

In general, two intervals are overlapping if:

min([upperBoundOfA, upperBoundOfB]) >= max([lowerBoundOfA, lowerBoundOfB])

If this is the case, the union of those intervals is:

(min([lowerBoundOfA, lowerBoundOfB]), max([upperBoundOfA, upperBoundOfB])

Similarly, the intersection of those intervals will be:

(min([upperBoundOfA, upperBoundOfB]), max([lowerBoundOfA, lowerBoundOfB]))

Upvotes: 1

Herrington Darkholme
Herrington Darkholme

Reputation: 6315

I think one problem in your code is that it does not handle with the situation where one list contains another. For example, [0,100] contains [9,10]. When you loop n in [0,100] and n enters [9,10], the condition statement if n in range(x[i+1][0], x[i+1][1]) is triggered. Then the builtin function max will compare [0, 100] and [9, 10], and unfortuantely max will return [9,10] because it compare the first number in the list. Thus you remove the wrong element.

I'm trying another way to achieve the effect you want. Rather than manipulating the list itself, I create a new list. Conditionally appending new element to it if it meets our requirements.

def cleanup_list(lists):
    ranges = []
    for l in lists:
        to_insert = True
        for i in ranges:
            r = range(i[0],i[1])
            # if l overlaps with i, but l does not contain i
            if l[0] in r or l[1] in r:
                if (l[1]-l[0]) < len(r):
                    ranges.remove(i)
                else:
                    to_insert = False
            # l contains i
            if l[0]<i[0] and l[1]>i[1]:
                to_insert = False
        if to_insert:
            ranges.append(l)
    return ranges

Upvotes: 2

richselian
richselian

Reputation: 731

  1. ascending sort all items by length.

  2. add them to a segment tree and ignore overlapped items.

Upvotes: 1

Related Questions