Reputation: 37
I'm getting lost in trying to find a easy way to determine when date ranges from 2 data frame are not overlapping.
I have 2 dataframes :
df1 = pd.DataFrame({
'START':['2019-01-01 09:00:00', '2019-01-01 18:00:00'],
'END':['2019-01-01 16:00:00', '2019-01-02 02:00:00']})
df2 = pd.DataFrame({
'START':['2019-01-01 08:00:00', '2019-01-01 14:00:00', '2019-01-01 22:00:00', '2019-01-02 01:00:00'],
'END':['2019-01-01 11:00:00', '2019-01-01 15:00:00', '2019-01-01 23:00:00', '2019-01-02 04:00:00']})
df1 :
START END
0 2019-01-01 09:00:00 2019-01-01 16:00:00
1 2019-01-01 18:00:00 2019-01-02 02:00:00
df2 :
START END
0 2019-01-01 08:00:00 2019-01-01 11:00:00
1 2019-01-01 14:00:00 2019-01-01 15:00:00
2 2019-01-01 22:00:00 2019-01-01 23:00:00
3 2019-01-02 01:00:00 2019-01-02 04:00:00
and I would like get the date range when df1 is not overlapped by df2:
START END
0 2019-01-01 11:00:00 2019-01-01 14:00:00
1 2019-01-01 15:00:00 2019-01-01 16:00:00
2 2019-01-01 18:00:00 2019-01-01 22:00:00
3 2019-01-01 23:00:00 2019-01-02 01:00:00
Thanks for your help.
Upvotes: 0
Views: 290
Reputation: 2626
sympy
and it's Interval
could be used, here's an example
from sympy import Interval
from dateutil import parser
from datetime import datetime
import pandas as pd
def remove_overlap(time_intervals, overlapping_intervals):
time_span = None
for _, interval in time_intervals.iterrows():
start_ts = parser.parse(interval["START"]).timestamp()
end_ts = parser.parse(interval["END"]).timestamp()
interval_ts = Interval(start_ts, end_ts)
time_span = interval_ts + time_span if time_span else interval_ts
for _, interval in overlapping_intervals.iterrows():
start_ts = parser.parse(interval["START"]).timestamp()
end_ts = parser.parse(interval["END"]).timestamp()
interval_ts = Interval(start_ts, end_ts)
time_span = time_span - interval_ts
bounds_ts = list(time_span.boundary)
bounds_dates = [datetime.fromtimestamp(t).strftime("%Y-%m-%d %H:%M:%S") for t in bounds_ts]
df = pd.DataFrame({"START": bounds_dates[0::2], "END":bounds_dates[1::2]})
return df
df1 = pd.DataFrame({
'START':['2019-01-01 09:00:00', '2019-01-01 18:00:00'],
'END':['2019-01-01 16:00:00', '2019-01-02 02:00:00']})
df2 = pd.DataFrame({
'START':['2019-01-01 08:00:00', '2019-01-01 14:00:00', '2019-01-01 22:00:00', '2019-01-02 01:00:00'],
'END':['2019-01-01 11:00:00', '2019-01-01 15:00:00', '2019-01-01 23:00:00', '2019-01-02 04:00:00']})
remove_overlap(df1,df2)
this outputs
START END
0 2019-01-01 11:00:00 2019-01-01 14:00:00
1 2019-01-01 15:00:00 2019-01-01 16:00:00
2 2019-01-01 18:00:00 2019-01-01 22:00:00
3 2019-01-01 23:00:00 2019-01-02 01:00:00
Upvotes: 1