Reputation: 483
I have two dataframes, df1 and df2, df1 has three columns - group, startdate1 and enddate1, and df2 also has three columns, group, startdate2 and enddate2. I'd like to compare for each group in df1, if the interval (startdate1,enddate1) overlaps with any interval of (startdate2,enddate2) for the same group.
I found this post(Is it possible to use Pandas Overlap in a Dataframe?) which used pandas.IntervalIndex.overlaps to check interval overlap. It's very similar to my question, but I'm struggling on how to use groupby for pandas.IntervalIndex.overlaps (or should I use other methods)? Below are some sample data:
df1:
group | startdate1 | enddate1 |
---|---|---|
A | 2017-07-01 | 2018-06-30 |
B | 2017-07-01 | 2018-06-30 |
A | 2018-07-01 | 2019-06-30 |
B | 2019-07-01 | 2020-06-30 |
df2:
group | startdate2 | enddate2 |
---|---|---|
A | 2017-05-01 | 2018-04-30 |
A | 2019-10-01 | 2020-01-31 |
B | 2017-07-02 | 2018-06-29 |
B | 2018-07-01 | 2019-06-30 |
The expected output is to add a column of 1 or 0 in df1 if there's any interval overlap with df2 for the same group. df_output:
group | startdate1 | enddate1 | flag |
---|---|---|---|
A | 2017-07-01 | 2018-06-30 | 1 |
B | 2017-07-01 | 2018-06-30 | 1 |
A | 2018-07-01 | 2019-06-30 | 0 |
B | 2019-07-01 | 2020-06-30 | 0 |
Thank you!
Upvotes: 1
Views: 770
Reputation: 9941
You can make cartesian join within groups, find indexes of records in df1
that overlap by date range with df2
, and then add flag by checking if the index of a record is in that list:
ixs = (df1.reset_index().merge(df2, on=['group'])
.query('(startdate1 < enddate2) & (enddate1 > startdate2)'))['index']
df1.assign(flag=df1.index.isin(ixs).astype(int))
Output:
group startdate1 enddate1 flag
0 A 2017-07-01 2018-06-30 1
1 B 2017-07-01 2018-06-30 1
2 A 2018-07-01 2019-06-30 0
3 B 2019-07-01 2020-06-30 0
P.S. I'm assuming all dates are in datetime
format already, otherwise we need to pd.to_datetime(...)
those columns first
Upvotes: 3