Reputation: 413
I have a dataframe in pandas like this:
id some_type some_date some_data
0 1 A 19/12/1995 X
1 2 A 10/04/1997 Y
2 2 B 05/03/2013 Z
3 2 B 09/05/2017 W
4 2 B 09/05/2017 R
5 3 A 01/07/1998 M
6 3 B 09/08/2009 N
I need for each value of id, the rows that have the max value of some_type and some_date without deleting any value of some_data.
In other words, what I need is the following:
id some_type some_date some_data
0 1 A 19/12/1995 X
3 2 B 09/05/2017 W
4 2 B 09/05/2017 R
6 3 B 09/08/2009 N
Upvotes: 0
Views: 164
Reputation: 29635
you can do it with sort_values
, groupby
and apply
by keeping the rows with the last value some_type and some_date:
df_output = (df.sort_values(by=['some_type','some_date']).groupby('id')
.apply(lambda df_g: df_g[(df_g['some_type'] == df_g['some_type'].iloc[-1]) &
(df_g['some_date'] == df_g['some_date'].iloc[-1])])
.reset_index(0,drop=True))
and the output is:
id some_type some_date some_data
0 1 A 1995-12-19 X
3 2 B 2017-09-05 W
4 2 B 2017-09-05 R
6 3 B 2009-09-08 N
EDIT: if you don't care about the indexes, you can also use merge
:
#first get the last one once sorting
df_last = df.sort_values(['some_type','some_date']).groupby('id')['some_type','some_date'].last()
# now merge with inner to keep the one you want
df_output = df.merge(df_last ,how='inner')
you will get the same result besides indexes
Upvotes: 2
Reputation: 18906
Create a mask with groupby
and max()
and apply. But first convert to datetime:
df['some_date'] = pd.to_datetime(df['some_date'])
m = df.groupby('id')['some_type','some_date'].transform(lambda x: x == x.max()).all(1)
df = df[m]
Full example:
import pandas as pd
text = '''\
id some_type some_date some_data
1 A 19/12/1995 X
2 A 10/04/1997 Y
2 B 05/03/2013 Z
2 B 09/05/2017 W
2 B 09/05/2017 R
3 A 01/07/1998 M
3 B 09/08/2009 N'''
fileobj = pd.compat.StringIO(text)
df = pd.read_csv(fileobj, sep='\s+')
df['some_date'] = pd.to_datetime(df['some_date'])
m = df.groupby('id')['some_type','some_date'].transform(lambda x: x == x.max()).all(1)
df = df[m]
print(df)
Returns:
id some_type some_date some_data
0 1 A 1995-12-19 X
3 2 B 2017-09-05 W
4 2 B 2017-09-05 R
6 3 B 2009-09-08 N
Upvotes: 2