pandas select rows with the max value of some columns for each different value of another column

I have a dataframe in pandas like this:

    id  some_type   some_date   some_data
0   1   A           19/12/1995  X
1   2   A           10/04/1997  Y
2   2   B           05/03/2013  Z
3   2   B           09/05/2017  W
4   2   B           09/05/2017  R
5   3   A           01/07/1998  M
6   3   B           09/08/2009  N

I need for each value of id, the rows that have the max value of some_type and some_date without deleting any value of some_data.

In other words, what I need is the following:

    id  some_type   some_date   some_data
0   1   A           19/12/1995  X
3   2   B           09/05/2017  W
4   2   B           09/05/2017  R
6   3   B           09/08/2009  N

Upvotes: 0

Views: 164

Answers (2)

Ben.T
Ben.T

Reputation: 29635

you can do it with sort_values, groupby and apply by keeping the rows with the last value some_type and some_date:

df_output = (df.sort_values(by=['some_type','some_date']).groupby('id')
                .apply(lambda df_g: df_g[(df_g['some_type'] == df_g['some_type'].iloc[-1]) & 
                                          (df_g['some_date'] == df_g['some_date'].iloc[-1])])
                  .reset_index(0,drop=True))

and the output is:

   id some_type  some_date some_data
0   1         A 1995-12-19         X
3   2         B 2017-09-05         W
4   2         B 2017-09-05         R
6   3         B 2009-09-08         N

EDIT: if you don't care about the indexes, you can also use merge:

#first get the last one once sorting
df_last = df.sort_values(['some_type','some_date']).groupby('id')['some_type','some_date'].last()
# now merge with inner to keep the one you want
df_output  = df.merge(df_last ,how='inner')

you will get the same result besides indexes

Upvotes: 2

Anton vBR
Anton vBR

Reputation: 18906

Create a mask with groupby and max() and apply. But first convert to datetime:

df['some_date'] = pd.to_datetime(df['some_date'])
m = df.groupby('id')['some_type','some_date'].transform(lambda x: x == x.max()).all(1)  
df = df[m]

Full example:

import pandas as pd

text = '''\
id  some_type   some_date   some_data
1   A           19/12/1995  X
2   A           10/04/1997  Y
2   B           05/03/2013  Z
2   B           09/05/2017  W
2   B           09/05/2017  R
3   A           01/07/1998  M
3   B           09/08/2009  N'''

fileobj = pd.compat.StringIO(text)
df = pd.read_csv(fileobj, sep='\s+')

df['some_date'] = pd.to_datetime(df['some_date'])

m = df.groupby('id')['some_type','some_date'].transform(lambda x: x == x.max()).all(1)

df = df[m]

print(df)

Returns:

   id some_type  some_date some_data
0   1         A 1995-12-19         X
3   2         B 2017-09-05         W
4   2         B 2017-09-05         R
6   3         B 2009-09-08         N

Upvotes: 2

Related Questions