How to check for interval overlap for grouped item on two dataframes?

Question

I have two dataframes, df1 and df2, df1 has three columns - group, startdate1 and enddate1, and df2 also has three columns, group, startdate2 and enddate2. I'd like to compare for each group in df1, if the interval (startdate1,enddate1) overlaps with any interval of (startdate2,enddate2) for the same group.

I found this post(Is it possible to use Pandas Overlap in a Dataframe?) which used pandas.IntervalIndex.overlaps to check interval overlap. It's very similar to my question, but I'm struggling on how to use groupby for pandas.IntervalIndex.overlaps (or should I use other methods)? Below are some sample data:

df1:

group	startdate1	enddate1
A	2017-07-01	2018-06-30
B	2017-07-01	2018-06-30
A	2018-07-01	2019-06-30
B	2019-07-01	2020-06-30

df2:

group	startdate2	enddate2
A	2017-05-01	2018-04-30
A	2019-10-01	2020-01-31
B	2017-07-02	2018-06-29
B	2018-07-01	2019-06-30

The expected output is to add a column of 1 or 0 in df1 if there's any interval overlap with df2 for the same group. df_output:

group	startdate1	enddate1	flag
A	2017-07-01	2018-06-30	1
B	2017-07-01	2018-06-30	1
A	2018-07-01	2019-06-30	0
B	2019-07-01	2020-06-30	0

Thank you!

perl · Accepted Answer

You can make cartesian join within groups, find indexes of records in df1 that overlap by date range with df2, and then add flag by checking if the index of a record is in that list:

ixs = (df1.reset_index().merge(df2, on=['group'])
         .query('(startdate1 < enddate2) & (enddate1 > startdate2)'))['index']

df1.assign(flag=df1.index.isin(ixs).astype(int))

Output:

  group startdate1   enddate1  flag
0     A 2017-07-01 2018-06-30     1
1     B 2017-07-01 2018-06-30     1
2     A 2018-07-01 2019-06-30     0
3     B 2019-07-01 2020-06-30     0

P.S. I'm assuming all dates are in datetime format already, otherwise we need to pd.to_datetime(...) those columns first

How to check for interval overlap for grouped item on two dataframes?

Answers (1)

Related Questions