Dawit Abate
Dawit Abate

Reputation: 460

Get unions and intersections of list of datetime ranges python

I have two lists of datetime ranges. eg.

l1 = [(datetime.datetime(2018, 8, 29, 1, 0, 0), datetime.datetime(2018, 8, 29, 3, 0, 0)), (datetime.datetime(2018, 8, 29, 6, 0, 0), datetime.datetime(2018, 8, 29, 9, 0, 0))]
l2 = [(datetime.datetime(2018, 8, 29, 2, 0, 0), datetime.datetime(2018, 8, 29, 4, 0, 0)), (datetime.datetime(2018, 8, 29, 5, 0, 0), datetime.datetime(2018, 8, 29, 7, 0, 0))]

And I want to get the union of l1 and l2. The desired outputs are:

union = [(datetime.datetime(2018, 8, 29, 1, 0, 0), datetime.datetime(2018, 8, 29, 4, 0, 0)), (datetime.datetime(2018, 8, 29, 5, 0, 0), datetime.datetime(2018, 8, 29, 9, 0, 0))]
intersection = [(datetime.datetime(2018, 8, 29, 2, 0, 0), datetime.datetime(2018, 8, 29, 3, 0, 0)), (datetime.datetime(2018, 8, 29, 6, 0, 0), datetime.datetime(2018, 8, 29, 7, 0, 0))]

Real data may not be this perfectly aligned.

Upvotes: 3

Views: 4946

Answers (2)

Amadan
Amadan

Reputation: 198436

The answer here is very useful for what you're asking, as it can compact an array of overlapping ranges:

from operator import itemgetter

def consolidate(intervals):
    sorted_intervals = sorted(intervals, key=itemgetter(0))

    if not sorted_intervals:  # no intervals to merge
        return

    # low and high represent the bounds of the current run of merges
    low, high = sorted_intervals[0]

    for iv in sorted_intervals[1:]:
        if iv[0] <= high:  # new interval overlaps current run
            high = max(high, iv[1])  # merge with the current run
        else:  # current run is over
            yield low, high  # yield accumulated interval
            low, high = iv  # start new run

    yield low, high  # end the final run

Union of l1 and l2 is simply the consolidation of all ranges in both l1 and l2:

def union(l1, l2):
    return consolidate([*l1, *l2])

Intersection of l1 and l2 is adequately done by AChampion's code (if there is any overlap between any range in l1 and any range in l2, that overlap deserves to be in the result), but it could lead to fragmentation of ranges; we can use this same function to join the adjacent or overlapping ranges:

from itertools import product

def intersection(l1, l2):
    result = ((max(s1, s2), min(e1, e2)) for (s1, e1), (s2, e2) in product(l1, l2) if s1 < e2 and e1 > s2)
    return consolidate(result)

An example:

l1 = [(1, 7), (4, 8), (10, 15), (20, 30), (50, 60)]
l2 = [(3, 6), (8, 11), (15, 20)]
print(list(union(l1, l2)))         # [(1, 30), (50, 60)]
print(list(intersection(l1, l2)))  # [(3, 6), (10, 11)]

(The example uses integers for clarity, but it works with any comparable type. Specifically, for OP's l1 and l2, the code yields OP's desired datetime results.)

Upvotes: 4

AChampion
AChampion

Reputation: 30278

Your definition of union and intersection for the date range can be simply described as :-

Union:

In []:
from itertools import product
[(min(s1, s2), max(e1, e2)) for (s1, e1), (s2, e2) in product(l1, l2) if s1 <= e2 and e1 >= s2]

Out[]:
[(datetime.datetime(2018, 8, 29, 1, 0), datetime.datetime(2018, 8, 29, 4, 0)),
 (datetime.datetime(2018, 8, 29, 5, 0), datetime.datetime(2018, 8, 29, 9, 0))]

Intersection:

In []:
[(max(s1, s2), min(e1, e2)) for (s1, e1), (s2, e2) in product(l1, l2) if s1 <= e2 and e1 >= s2]

Out[]:
[(datetime.datetime(2018, 8, 29, 2, 0), datetime.datetime(2018, 8, 29, 3, 0)),
 (datetime.datetime(2018, 8, 29, 6, 0), datetime.datetime(2018, 8, 29, 7, 0))]

You can replace <= and >= with < and > if they strictly have to overlap and not just touch.

Upvotes: 1

Related Questions