segment overlapping regions into disjoint regions

Question

Given a set of closed regions [a,b] where a and b are integers I need to find another set of regions that cover the same numbers but are disjoint.

I suppose it is possible to do naively by iterating through the set several times, but I am looking for a recommendation of a good algorithm for this. Please help.

EDIT:

to clarify, the resulting regions cannot be larger than the original ones, I have to come up with disjoint regions that are contained by the original ones. In other words, I need to split the original regions on the boundaries where they overlap.

example:

3,8
1,4
7,9
11,14

result:

1,3
3,4
4,7
7,8
8,9
11,14

John Coleman · Accepted Answer

(This is a modification of an answer that I posted earlier today which I deleted after I discovered it had a logic error. I later realized that I could modify Vincent van der Weele's elegant idea of using parenthesis depth to fix the bug)

On Edit: Modified to be able to accept intervals of length 0

Call an interval [a,a] of length 0 essential if a doesn't also appear as an endpoint of any interval of length > 0. For example, in [1,3], [2,2], [3,3], [4,4] the 0-length intervals [2,2] and [4,4] are essential but [3,3] isn't. Inessential 0-length intervals are redundant thus need not appear in the final output. When the list of intervals is initially scanned (loading the basic data structures) points corresponding to 0-length intervals are recorded, as are endpoint of intervals of length > 0. When the scan is completed, two instances of each point corresponding to essential 0-length intervals are added into the list of endpoints, which is then sorted. The resulting data structure is a multiset where the only repetitions correspond to essential 0-length intervals.

For every endpoint in the interval define the pdelta (parentheses delta) of the endpoint as the number of times that point appears as a left endpoint minus the number of times it appears as a right endpoint. Store these in a dictionary keyed by the endpoints.

[a,b] where a,b are the first two elements of the list of endpoints is the first interval in the list of disjoint intervals. Define the parentheses depth of b to be the sum of pdelta[a] and pdelta[b]. We loop through the rest of the endpoints as follows:

In each pass through the loop look at the parenthesis depth of b. If it is not 0 than b is still needed for one more interval. Let a = b and let the new p be the next value in the list. Adjust the parentheses depth be the pdelta of the new b and add [a,b] to the list of disjoint intervals. Otherwise (if the parenthesis depth of b is 0) let the next [a,b] be the next two points in the list and adjust the parenthesis depth accordingly.

Here is a Python implementation:

def disjointify(intervals):
    if len(intervals) == 0: return []
    pdelta = {}
    ends = set()
    disjoints = []
    onePoints = set() #onePoint intervals
    for (a,b) in intervals:
        if a == b:
            onePoints.add(a)
            if not a in pdelta: pdelta[a] = 0
        else:
            ends.add(a)
            ends.add(b)
            pdelta[a] = pdelta.setdefault(a,0) + 1
            pdelta[b] = pdelta.setdefault(b,0) - 1
    onePoints.difference_update(ends)
    ends = list(ends)
    for a in onePoints:
        ends.extend([a,a])
    ends.sort()
    a = ends[0]
    b = ends[1]
    pdepth = pdelta[a] + pdelta[b]
    i = 1
    disjoints.append((a,b))
    while i < len(ends) - 1:
        if pdepth != 0:
            a = b
            b = ends[i+1]
            pdepth += pdelta[b]
            i += 1
        else:
            a = ends[i+1]
            b = ends[i+2]
            pdepth += (pdelta[a] + pdelta[b])
            i += 2
        disjoints.append((a,b))
    return disjoints

Sample output which illustrates various edge cases:

>>> example = [(1,1), (1,4), (2,2), (4,4),(5,5), (6,8), (7,9), (10,10)]
>>> disjointify(example)
[(1, 2), (2, 2), (2, 4), (5, 5), (6, 7), (7, 8), (8, 9), (10, 10)]

>>> disjointify([(1,1), (2,2)])
[(1, 1), (2, 2)]

(I am using Python tuples to represent the closed intervals even though it has the minor drawback of looking like the standard mathematical notation for open intervals).

A final remark: referring to the result as a collection of disjoint interval might not be accurate since some of these intervals have nonempty albeit 1-point intersections

segment overlapping regions into disjoint regions

Answers (2)

Related Questions