Reputation: 1097
Given this dataframe:
HOUSEID PERSONID HHSTATE TRPMILES
0 20000017 1 IN 22.000000
1 20000017 1 IN 0.222222
2 20000017 1 IN 22.000000
3 20000017 2 IN 22.000000
4 20000017 2 IN 0.222222
5 20000017 2 IN 0.222222
6 20000231 1 TX 3.000000
7 20000231 1 TX 2.000000
8 20000231 1 TX 6.000000
9 20000231 1 TX 5.000000
I want to sum values in TRPMILES
based on unique combination of HOUSEID
and PERSONID
:
HOUSEID PERSONID HHSTATE TRPMILES
0 20000017 1 IN 66.666666
1 20000017 2 IN 22.444444
2 20000231 1 TX 16.000000
then normalize the TRPMILES
based on HHSTATE
. I mean dividing every number of the same HHSTATE
category by the maximum number of trip mile in HHSTATE
column.
(66.666666 is maximum for IN
so dividing it by itself is 1.0000000 and by the second row is 0.3366666)
HOUSEID PERSONID HHSTATE TRPMILES
0 20000017 1 IN 1.0000000
1 20000017 2 IN 0.3366666
2 20000231 1 TX 1.0000000
I have a big dataframe with many possible values of HHSTATE
.
Thank you
Upvotes: 0
Views: 44
Reputation: 323276
Try with groupby
then do div
s=df.groupby(['HOUSEID', 'PERSONID', 'HHSTATE'])['TRPMILES'].sum()
s=s.div(s.max(level=[2]),level=2).reset_index()
Upvotes: 2