Reputation: 53
Given a DataFrame I want to make a new DataFrame for rows with the max timestamp, for a combo of columns
Combo: category, revision, type, subtype
sub_type may/not have a value (but the None is part of its unique-ness)
I won't have duplicates based on the above (no ties on timestamp)
action category YYYYMM timestamp sub_type type
0 buy A 202002 4 None apple
1 sell A 202002 5 None apple
2 buy A 202002 4 green apple
3 buy A 202002 4 red apple
4 sell A 202002 3 red apple
5 sell A 202002 1 None orange
6 sell B 202002 6 None apple
The result for the above DataFrame would look like this:
action category revision timestamp sub_type type
0 sell A 202002 5 None apple
1 buy A 202002 4 green apple
2 buy A 202002 4 red apple
3 sell A 202002 1 None orange
4 sell B 202002 6 None apple
Basically - I want the last action for a combo of attributes
Upvotes: 0
Views: 1699
Reputation: 323226
So we need use fillna
here ,since None == None will return True. After that we can do sort_values
then drop_duplicates
out = df.sort_values('timestamp').fillna('None').\
drop_duplicates(['category','sub_type','YYYYMM','type'],keep='last').\
sort_index()
out
Out[128]:
action category YYYYMM timestamp sub_type type
1 sell A 202002 5 None apple
2 buy A 202002 4 green apple
3 buy A 202002 4 red apple
5 sell A 202002 1 None orange
6 sell B 202002 6 None apple
Upvotes: 2