How to group by key and only return observation by max from pandas dataframe

Question

How does one return all observations from a dataframe that only hold the max value for each unique key associated? I tried groupby and max but for every difference I get more than on value per key returned back. See example below:

import pandas as pd
key = [111,111,222,333,444,555]
flag = [0,1,1,1,1,1]
date = [pd.to_datetime('2020-01-01'),pd.to_datetime('2020-01-02'), 
        pd.to_datetime('2020-02-01'),pd.to_datetime('2020-04-01'),
        pd.to_datetime('2020-03-01'),pd.to_datetime('2020-05-01')]

df_dic = {'key':key, 'flag':flag, 'date':date}
df = pd.DataFrame(df_dic)
df
df.groupby(['key', 'flag']).agg({'date':'max'}).reset_index()

This returns:

key  flag       date
0  111     0 2020-01-01
1  111     1 2020-01-02
2  222     1 2020-02-01
3  333     1 2020-04-01

I want to return only the observations with the max date per unique key like so:

key  flag       date
0  111     1 2020-01-02
1  222     1 2020-02-01
2  333     1 2020-04-01

Ben.T · Accepted Answer

you can avoid groupby and use sort_values and drop_dplicates.

print(df.sort_values('date').drop_duplicates('key', keep='last'))
   key  flag       date
1  111     1 2020-01-02
2  222     1 2020-02-01
4  444     1 2020-03-01
3  333     1 2020-04-01
5  555     1 2020-05-01

Note that you can add the key column in the sort_values if the order matters to you in the result df.sort_values(['key', 'date'])...

How to group by key and only return observation by max from pandas dataframe

Answers (2)

Related Questions