norbeq
norbeq

Reputation: 3076

Merging overlapping interval objects with dependencies

i need to merge interval objects to get distinct ranges of intervals based on extra parameters. How is the best way to do that?

It's about unambiguous statement whether in a given hour state is true. The returned list must have non-duplicated intervals.

Interval object description:

{
    'startDate': datetime.datetime, # start of interval
    'endDate': datetime.datetime, # end of interval
    'prioritized': bool # if True - it's always important, override no-prioritized intervals
    'state': bool # result of interval
}

In the examples below i changed startDate/endDate to strings to make them look better.

Interval list look like:

interval_list = [
    {'startDate': '10:00:00', 'endDate': '12:00:00', 'prioritized': False, 'state': False},
    {'startDate': '11:00:00', 'endDate': '18:00:00', 'prioritized': True, 'state': True},
    {'startDate': '13:00:00', 'endDate': '17:00:00', 'prioritized': False, 'state': False},
    {'startDate': '17:00:00', 'endDate': '20:00:00', 'prioritized': False, 'state': True},
    {'startDate': '19:30:00', 'endDate': '19:45:00', 'prioritized': True, 'state': False}
]

I am trying to achieve the following:

merge(interval_list) should return:

[
    {'startDate': '10:00:00', 'endDate': '11:00:00', 'state': False},
    {'startDate': '11:00:00', 'endDate': '19:30:00', 'state': True},
    {'startDate': '19:30:00', 'endDate': '19:45:00', 'state': False},
    {'startDate': '19:45:00', 'endDate': '20:00:00', 'state': True},
]

I have following not completed code right now:

def merge_range(ranges: list):
    ranges = sorted(ranges, key=lambda x: x['startDate'])
    last_interval = dict(ranges[0])

    for current_interval in sorted(ranges, key=lambda x: x['startDate']):
        if current_interval['startDate'] > last_interval['endDate']:
            yield dict(last_interval)
            last_interval['startDate'] = current_interval['startDate']
            last_interval['endDate'] = current_interval['endDate']
            last_interval['prioritized'] = current_interval['prioritized']
            last_interval['state'] = current_interval['state']
        else:
            if current_interval['state'] == last_interval['state']:
                last_interval['endDate'] = max(last_interval['endDate'], current_interval['endDate'])
            else:
                pass # i stopped here

    yield dict(last_interval)

And use it by merged_interval_list = list(merge_range(interval_list))

Is it a good way ?

Upvotes: 1

Views: 182

Answers (2)

jferard
jferard

Reputation: 8180

When we deal with time intervals, the main idea is to sort the dates (start and end) along with their status: start or end. Here, we need an access to the original interval too, to handle priorities and states.

Let's try with this list:

interval_list = [
    {'startDate': '10:00:00', 'endDate': '12:00:00', 'prioritized': False, 'state': False},
    {'startDate': '11:00:00', 'endDate': '18:00:00', 'prioritized': True, 'state': True},
    {'startDate': '13:00:00', 'endDate': '17:00:00', 'prioritized': False, 'state': False},
    {'startDate': '17:00:00', 'endDate': '20:00:00', 'prioritized': False, 'state': True},
    {'startDate': '19:30:00', 'endDate': '19:45:00', 'prioritized': True, 'state': False}
]

First, we convert datestrings to dates (as you did):

import datetime

day = '2019-05-10'
def get_datetime(d, t):
    return datetime.datetime.strptime(d+" "+t, "%Y-%m-%d %H:%M:%S")

for interval in interval_list:
    interval['startDate'] = get_datetime(day, interval['startDate'])
    interval['endDate'] =  get_datetime(day, interval['endDate'])

Now, we build a new list with the needed information:

L = sorted(
    [(interval['startDate'], 1, i) for i, interval in enumerate(interval_list)]
    +[(interval['endDate'], -1, i) for i, interval in enumerate(interval_list)]
)

L is the following list of tuples (date, dir, index) (dir: 1 means it's a start date, -1 means it's an end date):

[(datetime.datetime(2019, 5, 10, 10, 0), 1, 0), (datetime.datetime(2019, 5, 10, 11, 0), 1, 1), (datetime.datetime(2019, 5, 10, 12, 0), -1, 0), (datetime.datetime(2019, 5, 10, 13, 0), 1, 2), (datetime.datetime(2019, 5, 10, 17, 0), -1, 2), (datetime.datetime(2019, 5, 10, 17, 0), 1, 3), (datetime.datetime(2019, 5, 10, 18, 0), -1, 1), (datetime.datetime(2019, 5, 10, 19, 30), 1, 4), (datetime.datetime(2019, 5, 10, 19, 45), -1, 4), (datetime.datetime(2019, 5, 10, 20, 0), -1, 3)]

Now we can iterate over L and keep a track of the current state and priority to yield dates when state is modified according to given priority:

def interval_info(i):
    interval = interval_list[i]
    return interval['state'], interval['prioritized']

T = []
stack = []
for boundary_date, direction, i in L:
    state, prioritized = interval_info(i) # state and priority of the current date
    if direction == 1: # start date
        if stack:
            prev_state, prev_prioritized = interval_info(stack[-1]) # previous infos
            if state != prev_state and prioritized >= prev_prioritized: # enter a new state with a greater or equal priority
                T.append((boundary_date, state)) # enter in new state
        else: # begin of covered area
            T.append((boundary_date, state)) # enter in new state
        stack.append(i) # add the opened interval
    elif direction == -1: # end date
        stack.remove(i) # remove the closed interval (i is a *value* in stack)
        if stack:
            prev_state, prev_prioritized = interval_info(stack[-1])
            if state != prev_state and not prev_prioritized: # leave a non priority state
                T.append((boundary_date, prev_state)) # re-enter in prev state
        else: # end of covered area
            T.append((boundary_date, None)) # enter in None state

The value of T is:

[(datetime.datetime(2019, 5, 10, 10, 0), False), (datetime.datetime(2019, 5, 10, 11, 0), True), (datetime.datetime(2019, 5, 10, 19, 30), False), (datetime.datetime(2019, 5, 10, 19, 45), True), (datetime.datetime(2019, 5, 10, 20, 0), None)]

You can then easily produce the output you wanted. Hope it helps!

EDIT: Bonus: how to convert start dates to time intervals:

>>> import datetime
>>> T = [(datetime.datetime(2019, 5, 10, 10, 0), False), (datetime.datetime(2019, 5, 10, 11, 0), True), (datetime.datetime(2019, 5, 10, 19, 30), False), (datetime.datetime(2019, 5, 10, 19, 45), True), (datetime.datetime(2019, 5, 10, 20, 0), None)]
>>> [{'startDate': s[0], 'endDate': e[0], 'state': s[1]} for s,e in zip(T, T[1:])]
[{'startDate': datetime.datetime(2019, 5, 10, 10, 0), 'endDate': datetime.datetime(2019, 5, 10, 11, 0), 'state': False}, {'startDate': datetime.datetime(2019, 5, 10, 11, 0), 'endDate': datetime.datetime(2019, 5, 10, 19, 30), 'state': True}, {'startDate': datetime.datetime(2019, 5, 10, 19, 30), 'endDate': datetime.datetime(2019, 5, 10, 19, 45), 'state': False}, {'startDate': datetime.datetime(2019, 5, 10, 19, 45), 'endDate': datetime.datetime(2019, 5, 10, 20, 0), 'state': True}]

You just have to zip every start date with the the next one to get intervals.

Upvotes: 0

norbeq
norbeq

Reputation: 3076

I got an answer for this question:

For first i separate events to prioritized and non-prioritize lists.

Based on the priority list, I create a negation of the interval on a given day.

Next i set prioritized list as main list and start iterate over non-prioritize list.

import datetime
from pprint import pprint

df = "%Y-%m-%d %H:%M:%S"
ds = "%Y-%m-%d"

events = {}
prioritized_events = {}

events["2019-05-10"] = [{
    'startDate': datetime.datetime.strptime("2019-05-10 01:00:00", df),
    'endDate': datetime.datetime.strptime("2019-05-10 02:00:00", df),
    'state': True
}, {
    'startDate': datetime.datetime.strptime("2019-05-10 10:00:00", df),
    'endDate': datetime.datetime.strptime("2019-05-10 12:00:00", df),
    'state': False
}, {
    'startDate': datetime.datetime.strptime("2019-05-10 13:00:00", df),
    'endDate': datetime.datetime.strptime("2019-05-10 17:00:00", df),
    'state': False
}, {
    'startDate': datetime.datetime.strptime("2019-05-10 17:00:00", df),
    'endDate': datetime.datetime.strptime("2019-05-10 20:00:00", df),
    'state': True
}]

prioritized_events["2019-05-10"] = [{
    'startDate': datetime.datetime.strptime("2019-05-10 11:00:00", df),
    'endDate': datetime.datetime.strptime("2019-05-10 18:00:00", df),
    'state': True
}, {
    'startDate': datetime.datetime.strptime("2019-05-10 19:30:00", df),
    'endDate': datetime.datetime.strptime("2019-05-10 20:00:00", df),
    'state': False
}]

allowed_intervals = []
for event_date in prioritized_events:
    minimal_time = datetime.datetime.combine(datetime.datetime.strptime(event_date, ds), datetime.time.min)
    maximum_time = datetime.datetime.combine(datetime.datetime.strptime(event_date, ds), datetime.time.max)

    for ev in prioritized_events[event_date]:
        if ev['startDate'] != minimal_time:
            allowed_intervals.append({
                'startDate': minimal_time,
                'endDate': ev['startDate']
            })
            minimal_time = ev['endDate']

    if prioritized_events[event_date][len(prioritized_events[event_date]) - 1]['endDate'] != maximum_time:
        allowed_intervals.append({
            'startDate': prioritized_events[event_date][len(prioritized_events[event_date]) - 1]['endDate'],
            'endDate': maximum_time
        })

for event_date in events:
    if event_date not in prioritized_events:
        prioritized_events[event_date] = events[event_date]
    else:
        for ev in events[event_date]:
            start = ev['startDate']
            end = ev['endDate']
            state = ev['state']
            done = False
            for allowed_interval in allowed_intervals:
                if start >= allowed_interval['startDate'] and end <= allowed_interval['endDate']:
                    prioritized_events[event_date].append({
                        'startDate': start,
                        'endDate': end,
                        'state': state
                    })
                    done = True
                    break
                elif allowed_interval['startDate'] <= start < allowed_interval['endDate'] < end:
                    prioritized_events[event_date].append({
                        'startDate': start,
                        'endDate': allowed_interval['endDate'],
                        'state': state
                    })
                    start = allowed_interval['endDate']
                elif start < allowed_interval['startDate'] and start < allowed_interval['endDate'] < end:
                    prioritized_events[event_date].append({
                        'startDate': allowed_interval['startDate'],
                        'endDate': allowed_interval['endDate'],
                        'state': state
                    })
                    start = allowed_interval['endDate']
                elif start < allowed_interval['startDate'] and start < allowed_interval['endDate'] and allowed_interval['startDate'] < end <= allowed_interval['endDate']:
                    prioritized_events[event_date].append({
                        'startDate': allowed_interval['startDate'],
                        'endDate': end,
                        'state': state
                    })
                    start = end
            if done:
                continue

    prioritized_events[event_date] = sorted(prioritized_events[event_date], key=lambda k: k['startDate'])

And now sorted list:

pprint(prioritized_events["2019-05-10"])

returns:

[
 {'startDate': datetime.datetime(2019, 5, 10, 1, 0),
  'endDate': datetime.datetime(2019, 5, 10, 2, 0),
  'state': True
 },
 {'startDate': datetime.datetime(2019, 5, 10, 10, 0),
  'endDate': datetime.datetime(2019, 5, 10, 11, 0),
  'state': False
 },
 {'startDate': datetime.datetime(2019, 5, 10, 11, 0),
  'endDate': datetime.datetime(2019, 5, 10, 18, 0),
  'state': True
 },
 {'startDate': datetime.datetime(2019, 5, 10, 18, 0),
  'endDate': datetime.datetime(2019, 5, 10, 19, 30),
  'state': True
 },
 {'startDate': datetime.datetime(2019, 5, 10, 19, 30),
  'endDate': datetime.datetime(2019, 5, 10, 20, 0),
  'state': False
 }
]

Upvotes: 1

Related Questions