soarfy
soarfy

Reputation: 65

Filtering based on date ranges from another dataframe

I have two pandas dataframes as following:

df1:

id  date        item
3   2015-11-23  B
3   2015-11-23  A
3   2016-05-11  C
3   2017-02-01  C
3   2018-07-12  E
4   2014-05-11  C
4   2015-02-01  C
4   2018-07-12  E

df2

id  start       end            
3   2016-05-11  2017-08-30
4   2015-01-11  2017-08-22

I would like to cut df1 such that I only keep items of df1 which falls within the date ranges given in df2:

id  date        item
3   2016-05-11  C
3   2017-02-01  C
4   2015-02-01  C

In reality, df1 and df2 are of millions of rows and therefore, I won't be able to do any quick fixes using for loops for example. I have rough idea of using groupby by id, but I am afraid all my tries have failed so far.

Thank you in advance!

Upvotes: 0

Views: 851

Answers (1)

el_oso
el_oso

Reputation: 1061

The basic way is to build a dataframe containing all possible events for that id. You can then filter on whether that event is between your two dates.

df3 = df1.merge(df2, how='inner', left_on='id', right_on='id')

df3[(df3['date'] <= df3['end']) & (df3['date'] >= df3['date'])]

Upvotes: 3

Related Questions