Reputation: 65
I have two pandas dataframes as following:
df1:
id date item
3 2015-11-23 B
3 2015-11-23 A
3 2016-05-11 C
3 2017-02-01 C
3 2018-07-12 E
4 2014-05-11 C
4 2015-02-01 C
4 2018-07-12 E
df2
id start end
3 2016-05-11 2017-08-30
4 2015-01-11 2017-08-22
I would like to cut df1 such that I only keep items of df1 which falls within the date ranges given in df2:
id date item
3 2016-05-11 C
3 2017-02-01 C
4 2015-02-01 C
In reality, df1 and df2 are of millions of rows and therefore, I won't be able to do any quick fixes using for loops for example. I have rough idea of using groupby by id, but I am afraid all my tries have failed so far.
Thank you in advance!
Upvotes: 0
Views: 851
Reputation: 1061
The basic way is to build a dataframe containing all possible events for that id
. You can then filter on whether that event is between your two dates.
df3 = df1.merge(df2, how='inner', left_on='id', right_on='id')
df3[(df3['date'] <= df3['end']) & (df3['date'] >= df3['date'])]
Upvotes: 3