Arthur Morgan
Arthur Morgan

Reputation: 145

How to find max value by group with conditions? Pandas

Product ID    Date      Sales
15475400    8/27/2019   23858
15475400    7/16/2019   21161
15475400    3/26/2018   31907
17104000    8/24/2019   39170
17104000    7/4/2019    29070
17104000    6/15/2019   41963
17104000    1/21/2019   38783
17169000    8/18/2018   58936
17169000    6/18/2018   47273
17169000    2/26/2018   28845
10842800    8/3/2019    41816
10842800    3/8/2019    41916
14901100    8/23/2019   37616

Greetings! I have above dataframe and I want to find the most recent records for each product before 8/1/2019.

I tried df.groupby('Product ID').timestamp.max() but don't know how to find products before 8/1/2019.

Expected outputs:

Product ID  Date        Sales
15475400    7/16/2019   21161
17104000    7/4/2019    29070
17169000    6/18/2018   47273
10842800    3/8/2019    41916
14901100    8/23/2019   37616

Thank you in advance for the help!

Upvotes: 1

Views: 778

Answers (1)

BENY
BENY

Reputation: 323226

First we need to filter the df then just using drop_duplicates

df['Date']=pd.to_datetime(df['Date'])
s=df.loc[df.Date<'2019-08-01'].sort_values('Date').drop_duplicates('ProductID',keep='last')
s
Out[277]: 
   ProductID       Date  Sales
6   17169000 2018-06-18  47273
8   10842800 2019-03-08  41916
3   17104000 2019-07-04  29070
1   15475400 2019-07-16  21161

Or we can do tail with groupby

df.loc[df.Date<'2019-08-01'].sort_values('Date').groupby('ProductID').tail(1)

Or idxmax

df.loc[df.loc[df.Date<'2019-08-01'].groupby('ProductID').Date.idxmax()]

Upvotes: 4

Related Questions