Reputation: 7458
I have a df
,
acct_no code date id
100 10 01/04/2019 22
100 10 01/03/2019 22
100 10 01/05/2019 22
200 20 01/06/2019 33
200 20 01/05/2019 33
200 20 01/07/2019 33
I want to first sort the df
in ascending order for date
when acct_no
and code
are the same,
df.sort_values(['acct_no', 'code', 'date'], inplace=True)
then I am wondering what the way to find the last row whose acct_no
, code
are the same as the previous row, the result need to look like,
acct_no code date id
100 10 01/05/2019 22
200 20 01/07/2019 33
Upvotes: 2
Views: 1362
Reputation: 75150
You can also try with groupby.last()
:
df.groupby(['acct_no', 'code'],as_index=False).last()
acct_no code date id
0 100 10 01/05/2019 22
1 200 20 01/07/2019 33
Upvotes: 3
Reputation: 863801
Use DataFrame.drop_duplicates
, but first convert column to datetimes:
#if dates are first use dayfirst=True
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
#if months are first
#df['date'] = pd.to_datetime(df['date'])
df1 = (df.sort_values(['acct_no', 'code', 'date'])
.drop_duplicates(['acct_no', 'code'], keep='last'))
print (df1)
acct_no code date id
2 100 10 2019-05-01 22
5 200 20 2019-07-01 33
Upvotes: 2