ali bakhtiari
ali bakhtiari

Reputation: 1097

Summing and normalizing pandas dataframe based on other columns

Given this dataframe:

    HOUSEID   PERSONID  HHSTATE  TRPMILES
0   20000017    1         IN    22.000000
1   20000017    1         IN    0.222222
2   20000017    1         IN    22.000000
3   20000017    2         IN    22.000000
4   20000017    2         IN    0.222222
5   20000017    2         IN    0.222222
6   20000231    1         TX    3.000000
7   20000231    1         TX    2.000000
8   20000231    1         TX    6.000000
9   20000231    1         TX    5.000000

I want to sum values in TRPMILES based on unique combination of HOUSEID and PERSONID:

     HOUSEID   PERSONID  HHSTATE  TRPMILES
0   20000017    1         IN    66.666666
1   20000017    2         IN    22.444444
2   20000231    1         TX    16.000000

then normalize the TRPMILES based on HHSTATE. I mean dividing every number of the same HHSTATE category by the maximum number of trip mile in HHSTATE column.

(66.666666 is maximum for IN so dividing it by itself is 1.0000000 and by the second row is 0.3366666)

  HOUSEID   PERSONID  HHSTATE  TRPMILES
0   20000017    1         IN    1.0000000
1   20000017    2         IN    0.3366666
2   20000231    1         TX    1.0000000

I have a big dataframe with many possible values of HHSTATE.

Thank you

Upvotes: 0

Views: 44

Answers (1)

BENY
BENY

Reputation: 323276

Try with groupby then do div

s=df.groupby(['HOUSEID',  'PERSONID', 'HHSTATE'])['TRPMILES'].sum()
s=s.div(s.max(level=[2]),level=2).reset_index()

Upvotes: 2

Related Questions