Reputation: 2871
I am new in pandas and groupby functionality.
I have Dataframe as shown below, which is a transaction data of customer as shown below, I want to find out the top two Dprtmnt
per Cus_No
based on their total Amount
.
Cus_No Date Dprtmnt Amount
111 6-Jun-18 AAA 100
111 6-Jun-18 AAA 50
111 8-Jun-18 BBB 125
111 8-Aug-18 CCC 130
111 12-Dec-18 BBB 200
111 15-Feb-17 AAA 10
111 18-Jan-18 AAA 20
222 6-Jun-18 DDD 100
222 6-Jun-18 AAA 50
222 8-Jun-18 AAA 125
222 8-Aug-18 DDD 130
222 12-Dec-18 AAA 200
222 15-Feb-17 CCC 10
222 18-Jan-18 CCC 20
My expected output is shown below.
Cus_No Top1D Top1Sum Top1_Frqnc Top2D Top2Sum Top2_Frqnc
111 BBB 325 2 AAA 180 4
222 AAA 375 3 DDD 230 2
Upvotes: 2
Views: 65
Reputation: 863031
First aggregate by GroupBy.agg
with sum
and size
, sort and get top2 by GroupBy.head
, last reshape by DataFrame.unstack
and create new columns names by map
and join
:
df = (df.groupby(['Cus_No','Dprtmnt'])['Amount']
.agg([('Sum','sum'),('Frqnc','size')])
.sort_values('Sum', ascending=False)
.groupby(level=0).head(2))
df = (df.set_index(df.groupby(level=0).cumcount().add(1).astype(str), append=True)
.reset_index(level=1)
.unstack()
.sort_index(axis=1, level=1))
df.columns = df.columns.map(''.join)
df = df.reset_index()
print (df)
RangeIndex(start=0, stop=14, step=1)
Cus_No Dprtmnt1 Frqnc1 Sum1 Dprtmnt2 Frqnc2 Sum2
0 111 BBB 2 325 AAA 4 180
1 222 AAA 3 375 DDD 2 230
Upvotes: 1