Reputation: 191
I have a dataframe which have 4 column User_id,Transaction_id,product and datetime. For each user i have to select his top n recent transaction,Let assume n=2, My dataframe is like:
transaction_id user_id product date
T1 U1 P1 2019-03-27
T1 U1 P2 2019-03-27
T1 U1 P3 2019-03-27
T2 U1 P2 2019-03-21
T2 U1 P3 2019-03-21
T3 U1 P2 2019-03-20
I tried to do it by taking help of this group by pandas dataframe and select latest in each group
What i am expecting as a output is:
transaction_id user_id product date
T1 U1 P1 2019-03-27
T1 U1 P2 2019-03-27
T1 U1 P3 2019-03-27
T2 U1 P2 2019-03-21
T2 U1 P3 2019-03-21
Upvotes: 1
Views: 37
Reputation: 862641
Idea is remove duplicates first by DataFrame.drop_duplicates
, get top2 values per groups and DataFrame.merge
original DataFrame:
df = (df.merge(df.drop_duplicates(['user_id','date'])
.sort_values('date',ascending = False)
.groupby('user_id')
.head(2)[['user_id','date']])
)
print (df)
transaction_id user_id product date
0 T1 U1 P1 2019-03-27
1 T1 U1 P2 2019-03-27
2 T1 U1 P3 2019-03-27
3 T2 U1 P2 2019-03-21
4 T2 U1 P3 2019-03-21
Upvotes: 1