Merge overlapping datetime intervals

Question

I have a df with multiple datetime intervals (start time, end time) and values.

input:

id   start          end            value
1    08:00:00.000   12:00:00.000   5
2    09:00:00.000   10:00:00.000   6
2    10:00:00.000   14:00:00.000   4
1    12:00:00.000   15:00:00.000   3

expected output:

id   start          end            value 
1    08:00:00.000   09:00:00.000   5
2    09:00:00.000   10:00:00.000   6
1    10:00:00.000   12:00:00.000   5
2    12:00:00.000   14:00:00.000   4
1    14:00:00.000   15:00:00.000   3

There is overlap between some of them. The objective is to have a succession of interval without overlapping.

diagram

When there is overlapping, I want to keep the interval with the highest value.

I coded a thing that loops on the df to find the overlapping intervals, create a new succession of intervals based on the condition and remove the old ones. I would like to find an alternative way, better optimized. Maybe with split of all intervals at the intersections and after, loop on the df and remove the intervals that overlap based on condition.

done = True

while done:
    done = False
    df_copy = df
    for i, row in df.iterrows():
        row_interval = pd.Interval(row.start, row.end)
        if done:
            break
        for j, row_copy in row_copy.iterrows():
            row_copy_interval = pd.Interval(row_copy.start, row_copy.end)
            if i is not j and row_interval.overlaps(row_copy_interval):
                earliest_start = np.minimum(row.start, row_copy.start)
                latest_start = np.maximum(row.start, row_copy.start)
                earliest_end = np.minimum(row.end, row_copy.end)
                latest_end = np.maximum(row.end, row_copy.end)

                if row.value > row_copy.value:
                    value = row.value
                else:
                    value = row_copy.value

                if row_interval == pd.Interval(earliest_start, earliest_end):
                    df = df.append('value': row.value, 'start': earliest_start,'end': latest_start}, ignore_index=True)
                    df = df.append('value': value, 'start': latest_start,'end': earliest_end}, ignore_index=True)
                    df = df.append('value': row_copy.value, 'start': earliest_end,'end': latest_end}, ignore_index=True)
                elif row_interval == pd.Interval(earliest_start, latest_end):
                    ...
                elif row_interval == pd.Interval(latest_start, latest_end):
                    ...
                elif row_interval == pd.Interval(latest_start, earliest_end):
                    ...

                df = df.drop([i, j]).drop_duplicates()
                done = True
                break

Merge overlapping datetime intervals

Answers (1)

Related Questions