Reputation: 145
Product ID Date Sales
15475400 8/27/2019 23858
15475400 7/16/2019 21161
15475400 3/26/2018 31907
17104000 8/24/2019 39170
17104000 7/4/2019 29070
17104000 6/15/2019 41963
17104000 1/21/2019 38783
17169000 8/18/2018 58936
17169000 6/18/2018 47273
17169000 2/26/2018 28845
10842800 8/3/2019 41816
10842800 3/8/2019 41916
14901100 8/23/2019 37616
Greetings! I have above dataframe and I want to find the most recent records for each product before 8/1/2019
.
I tried df.groupby('Product ID').timestamp.max()
but don't know how to find products before 8/1/2019
.
Expected outputs:
Product ID Date Sales
15475400 7/16/2019 21161
17104000 7/4/2019 29070
17169000 6/18/2018 47273
10842800 3/8/2019 41916
14901100 8/23/2019 37616
Thank you in advance for the help!
Upvotes: 1
Views: 778
Reputation: 323226
First we need to filter the df then just using drop_duplicates
df['Date']=pd.to_datetime(df['Date'])
s=df.loc[df.Date<'2019-08-01'].sort_values('Date').drop_duplicates('ProductID',keep='last')
s
Out[277]:
ProductID Date Sales
6 17169000 2018-06-18 47273
8 10842800 2019-03-08 41916
3 17104000 2019-07-04 29070
1 15475400 2019-07-16 21161
Or we can do tail
with groupby
df.loc[df.Date<'2019-08-01'].sort_values('Date').groupby('ProductID').tail(1)
Or idxmax
df.loc[df.loc[df.Date<'2019-08-01'].groupby('ProductID').Date.idxmax()]
Upvotes: 4