Summing and normalizing pandas dataframe based on other columns

Question

Given this dataframe:

    HOUSEID   PERSONID  HHSTATE  TRPMILES
0   20000017    1         IN    22.000000
1   20000017    1         IN    0.222222
2   20000017    1         IN    22.000000
3   20000017    2         IN    22.000000
4   20000017    2         IN    0.222222
5   20000017    2         IN    0.222222
6   20000231    1         TX    3.000000
7   20000231    1         TX    2.000000
8   20000231    1         TX    6.000000
9   20000231    1         TX    5.000000

I want to sum values in TRPMILES based on unique combination of HOUSEID and PERSONID:

     HOUSEID   PERSONID  HHSTATE  TRPMILES
0   20000017    1         IN    66.666666
1   20000017    2         IN    22.444444
2   20000231    1         TX    16.000000

then normalize the TRPMILES based on HHSTATE. I mean dividing every number of the same HHSTATE category by the maximum number of trip mile in HHSTATE column.

(66.666666 is maximum for IN so dividing it by itself is 1.0000000 and by the second row is 0.3366666)

  HOUSEID   PERSONID  HHSTATE  TRPMILES
0   20000017    1         IN    1.0000000
1   20000017    2         IN    0.3366666
2   20000231    1         TX    1.0000000

I have a big dataframe with many possible values of HHSTATE.

Thank you

BENY · Accepted Answer

Try with groupby then do div

s=df.groupby(['HOUSEID',  'PERSONID', 'HHSTATE'])['TRPMILES'].sum()
s=s.div(s.max(level=[2]),level=2).reset_index()

Summing and normalizing pandas dataframe based on other columns

Answers (1)

Related Questions