Reputation: 3551

Reduce a set of intervals to simplest representation

I have a list of "intervals" defined as a list of tuples. I am currently looping through the entire list of intervals and comparing each one to check if a value is in coverage. However because my list of intervals is very large (with a lot of overlap) this is doing a lot of unnecessary work.

For a simplified example:

def in_coverage(x, intervals)
    for start, end in intervals:
        if start <= x <= end:
            return True
    return False

intervals = [(1, 3), (3, 4)]
in_coverage(2, intervals) # => True
in_coverage(7, intervals) # => False

So, in order to reduce the amount of work done by this check I would like to perform a reduction on the list of intervals up front.

How can I efficiently reduce my set of intervals into their simplest representation?

For example:

[(1, 3), (3, 4)] => [(1, 4)]
[(1, 3), (3, 4), (2, 5), (7, 8), (9, 11)] => [(1, 5), (7, 8), (9, 11)]

Upvotes: 1

Answers (3)

Andrew

Reputation: 3061

The simple representation that you are looking for relies a simple observation: if you sort the tuple by low-value then overlapping intervals are adjacent*. Of course there is a caveat, if there are multiple overlaps then two overlapping intervals can be separated by a chain of intervals, but only if the chain is connected. Interval Graphs have some pretty cool properties.

Anyway, this means that you can merge overlapping intervals in a single greedy scan of the sorted list. In pseudo-code something like:

foreach interval in list if it overlaps current current = combined bounds are simple convex hull min(current low,interval low), max(current high, interval high) else current is finished, append to result current = interval

This takes care of minimising the number of intervals in the representation. If you want to maximise the lookup speed then the tree suggested by Ami Tavory is the way to go (I would also guess that his library takes care of this reduction in the size of the interval set implicitly during tree construction). There are a few other python packages that do the same thing: sparse interval set libraries. Try googling for "intervalset python" and check that it doesn't autocorrect it to something similar. I've used pyinter before and it seemed to be small, fast and with a simple interface. Banyan looks more powerful but it certainly does what you want - maybe Ami Tavory could cut'n'paste the 3-line example that shows how to setup and query an integer interval into his answer for clarity?

Upvotes: 3

John Coleman

Reputation: 51998

In the special case that all of the numbers involved are integers you could create the set of covered points as soon as you create your list of intervals and just check if your number is in the set of covered points:

def makeCoverSet(intervals):
    s = set()
    for (a,b) in intervals:
        s.update(range(a,b+1))
    return s

intervals = [(1, 3), (3, 4)]

coverage = makeCoverSet(intervals)

print(2 in coverage) #prints True
print(5 in coverage) #prints False

Upvotes: 1

Ami Tavory

Reputation: 76297

For your type of queries, an Interval Tree can make things much more efficient. One way to do so is using a tree ordered by the 'low' values of the intervals, and an extra annotation is added to every node recording the maximum high value among the tree: the node's high value and the high values of both its subtrees (see more here or in CLRS).

I wrote a PyPI package, banyan, which also does this stuff (see here). It supports exactly your type of queries.

Upvotes: 2

Reduce a set of intervals to simplest representation

Answers (3)

Related Questions