Get non-overlapping distinct intervals from a set of intervals Python

Question

Given a set of intervals, I would like to find non-overlapping distinct intervals from a set of intervals.

For example:

Input: [[1,10], [5,20], [6,21], [17,25], [22,23], [24,50] ,[30,55], [60,70]]

Output: [[1,5], [21,22], [23,24], [25,30], [50,55], [60,70]]

How can I do that?

What I have currently tried:

gene_bounds_list = [[1,10],[5,20], [6,21],[17,25],[22,23], [24,50],[30,55],[60,70]]
overlap_list = []
nonoverlap_list = []
nonoverlap_list.append(gene_bounds_list[0])

for i in range(1, len(gene_bounds_list)):
    curr_gene_bounds = gene_bounds_list[i]
    prev_gene_bounds = nonoverlap_list[-1]

    if curr_gene_bounds[0]prev_gene_bounds[1]:
            # previous gene was completely overlapping within current gene,
            # so replace previous gene by current (bigger) gene and put previous gene into overlap list
            overlap_list.append(nonoverlap_list[-1])
            new_bound = [gene_bounds_list[i][0], gene_bounds_list[i][1]]
            nonoverlap_list.pop()
            nonoverlap_list.append([new_bound[0], new_bound[1]])

    elif curr_gene_bounds[0] > prev_gene_bounds[0] and curr_gene_bounds[1] < prev_gene_bounds[1]:
        # completely within another gene
        overlap_list.append([curr_gene_bounds[0], curr_gene_bounds[1]])

    elif curr_gene_bounds[0] < prev_gene_bounds[1]:
        # partially overlapping with another gene
        new_bound = [nonoverlap_list[-1][1], curr_gene_bounds[1]]
        nonoverlap_list[-1][1] = curr_gene_bounds[0]
        nonoverlap_list.append([new_bound[0], new_bound[1]])

    else:
        # not overlapping with another gene
        nonoverlap_list.append([gene_bounds_list[i][0], gene_bounds_list[i][1]])

unique_data = [list(x) for x in set(tuple(x) for x in gene_bounds_list)]
within_overlapping_intervals = []

for small in overlap_list:
    for master in unique_data:
        if (small[0]==master[0] and small[1]==master[1]):
            continue
        if (small[0]>master[0] and small[1]



However, this gives me an output of: [[1, 5], [6, 10], [17, 20], [21, 22], [23, 24], [25, 30], [50, 55], [60, 70]]

Any ideas of how I can remove the unwanted intervals?

Thierry Lathuille · Accepted Answer

Here is a way to do it. The idea is to keep track of the number of layers of intervals at any point. For this, we add one layer when entering an interval, and remove one when exiting.

We start by building sorted lists of starts and ends. In order to identify the if a value is a start or an end, we create tuples (start, 1) or (end, -1).

Then, we merge these two lists, sorting by value, and iterate over the resulting list (using heapq.merge makes this easy). Each time the number of layers changes to 1, we have the start of a non-overlapping interval. When it changes again, it's the end of it.

from heapq import merge


def non_overlapping(data):
    out = []
    starts = sorted([(i[0], 1) for i in data])  # start of interval adds a layer of overlap
    ends = sorted([(i[1], -1) for i in data])   # end removes one
    layers = 0
    current = []
    for value, event in merge(starts, ends):    # sorted by value, then ends (-1) before starts (1)
        layers += event
        if layers ==1:  # start of a new non-overlapping interval
            current.append(value)
        elif current:  # we either got out of an interval, or started an overlap
            current.append(value)
            out.append(current)
            current = []
    return out


data = [[1,10], [5,20], [6,21], [17,25], [22,23], [24,50] ,[30,55], [60,70]]

non_overlapping(data)
# [[1, 5], [21, 22], [23, 24], [25, 30], [50, 55], [60, 70]]

Note that the expected answer you put in your question is wrong (for example, it contains a 45 that isn't part of the input data)

Get non-overlapping distinct intervals from a set of intervals Python

Answers (2)

Related Questions