Juan C
Juan C

Reputation: 6132

Check if date in one dataframe is between two dates in another dataframe, by group

I have the following problem. I've got a dataframe with start and end dates for each group. There might be more than one start and end date per group, like this:

group   start_date    end_date
 1      2020-01-03    2020-03-03
 1      2020-05-03    2020-06-03
 2      2020-02-03    2020-06-03

And another dataframe with one row per date, per group, like this:

group   date
 1     2020-01-03
 1     2020-02-03
 1     2020-03-03
 1     2020-04-03
 1     2020-05-03
 1     2020-06-03
 2     2020-02-03
 3     2020-03-03
 4     2020-04-03
      .
      .

So I want to create a column is_between in an efficient way, ideally avoiding loops, so I get the following dataframe

group   date          is_between
 1     2020-01-03        1
 1     2020-02-03        1
 1     2020-03-03        1
 1     2020-04-03        0
 1     2020-05-03        1
 1     2020-06-03        1
 2     2020-02-03        1
 3     2020-03-03        1
 4     2020-04-03        1
      .
      .

So it gets a 1 when a group's date is between the dates in the first dataframe. I'm guessing some combination of groupby, where, between and maybe map might do it, but I'm not finding the correct one. Any ideas?

Upvotes: 2

Views: 1513

Answers (2)

Ben.T
Ben.T

Reputation: 29635

you could try with merge_asof, by the group and on the date and start_date, then check where the date is less than end_date and finally assign back to the original df2

ser = (pd.merge_asof(df2.reset_index() #for later index alignment
                        .sort_values('date'), 
                     df1.sort_values('start_date'), 
                     by='group', 
                     left_on='date', right_on='start_date', 
                     direction='backward')
         .assign(is_between=lambda x: x.date<=x.end_date)
         .set_index(['index'])['is_between']
        )

df2['is_between'] = ser.astype(int)

print (df2)
   group       date  is_between
0      1 2020-01-03           1
1      1 2020-02-03           1
2      1 2020-03-03           1
3      1 2020-04-03           0
4      1 2020-05-03           1
5      1 2020-06-03           1
6      2 2020-02-03           1
7      3 2020-03-03           0
8      4 2020-04-03           0

Upvotes: 1

Juan C
Juan C

Reputation: 6132

Based on @YOBEN_S and @Quang Hoang's advice this made it:

df = df.merge(dic_dates, how='left')
df['is_between'] = np.where(df.date.between(pd.to_datetime(df.start_date),
                                                                   pd.to_datetime(df.end_Date)),1, 0)
df = (df.sort_values(by=['group', 'date', 'is_between'])
            .drop_duplicates(subset=['group', 'date'], keep='last'))

Upvotes: 2

Related Questions