Seb
Seb

Reputation: 37

Pandas - Date ranges that doesn't overlap

I'm getting lost in trying to find a easy way to determine when date ranges from 2 data frame are not overlapping.

I have 2 dataframes :

df1 = pd.DataFrame({
    'START':['2019-01-01 09:00:00', '2019-01-01 18:00:00'],
    'END':['2019-01-01 16:00:00', '2019-01-02 02:00:00']})
df2 = pd.DataFrame({
    'START':['2019-01-01 08:00:00', '2019-01-01 14:00:00', '2019-01-01 22:00:00', '2019-01-02 01:00:00'],
    'END':['2019-01-01 11:00:00', '2019-01-01 15:00:00', '2019-01-01 23:00:00', '2019-01-02 04:00:00']})

df1 :
                 START                  END
0  2019-01-01 09:00:00  2019-01-01 16:00:00
1  2019-01-01 18:00:00  2019-01-02 02:00:00

df2 :
                 START                  END
0  2019-01-01 08:00:00  2019-01-01 11:00:00
1  2019-01-01 14:00:00  2019-01-01 15:00:00
2  2019-01-01 22:00:00  2019-01-01 23:00:00
3  2019-01-02 01:00:00  2019-01-02 04:00:00

and I would like get the date range when df1 is not overlapped by df2:

                 START                  END
0  2019-01-01 11:00:00  2019-01-01 14:00:00
1  2019-01-01 15:00:00  2019-01-01 16:00:00
2  2019-01-01 18:00:00  2019-01-01 22:00:00
3  2019-01-01 23:00:00  2019-01-02 01:00:00

Thanks for your help.

Upvotes: 0

Views: 290

Answers (1)

avloss
avloss

Reputation: 2626

sympy and it's Interval could be used, here's an example

from sympy import Interval
from dateutil import parser
from datetime import datetime
import pandas as pd

def remove_overlap(time_intervals, overlapping_intervals):
    time_span = None

    for _, interval in time_intervals.iterrows():
        start_ts = parser.parse(interval["START"]).timestamp()
        end_ts = parser.parse(interval["END"]).timestamp()
        interval_ts = Interval(start_ts, end_ts)
        time_span = interval_ts + time_span if time_span else interval_ts

    for _, interval in overlapping_intervals.iterrows():
        start_ts = parser.parse(interval["START"]).timestamp()
        end_ts = parser.parse(interval["END"]).timestamp()
        interval_ts = Interval(start_ts, end_ts)
        time_span = time_span - interval_ts

    bounds_ts = list(time_span.boundary)
    bounds_dates = [datetime.fromtimestamp(t).strftime("%Y-%m-%d %H:%M:%S") for t in bounds_ts]

    df = pd.DataFrame({"START": bounds_dates[0::2], "END":bounds_dates[1::2]})
    return df

df1 = pd.DataFrame({
    'START':['2019-01-01 09:00:00', '2019-01-01 18:00:00'],
    'END':['2019-01-01 16:00:00', '2019-01-02 02:00:00']})
df2 = pd.DataFrame({
    'START':['2019-01-01 08:00:00', '2019-01-01 14:00:00', '2019-01-01 22:00:00', '2019-01-02 01:00:00'],
    'END':['2019-01-01 11:00:00', '2019-01-01 15:00:00', '2019-01-01 23:00:00', '2019-01-02 04:00:00']})

remove_overlap(df1,df2)

this outputs

START   END
0   2019-01-01 11:00:00 2019-01-01 14:00:00
1   2019-01-01 15:00:00 2019-01-01 16:00:00
2   2019-01-01 18:00:00 2019-01-01 22:00:00
3   2019-01-01 23:00:00 2019-01-02 01:00:00

Upvotes: 1

Related Questions