sfactor
sfactor

Reputation: 13062

How to sum values grouped by a categorical column in pandas?

I have data which has a categorical column that groups the data and other columns likes this in a dataframe df.

id      subid      value
1       10         1.5
1       20         2.5
1       30         7.0 
2       10         12.5
2       40         5

What I need is a column that has the average value for each subid within each id. For example df could be:

id      subid      value     id_sum    proportion
1       10         1.5       11.0      0.136
1       20         2.5       11.0      0.227
1       30         7.0       11.0      0.636
2       10         12.5      17.5      0.714
2       40         5         17.5      0.285

Now, I tried getting the id_sum column by doing:

df['id_sum'] = df.groupby('id')['value'].sum()

But this does not seem to work as hoped. My end goal is to get the proportion column. What is the correct way of getting that?

Upvotes: 7

Views: 10553

Answers (1)

ℕʘʘḆḽḘ
ℕʘʘḆḽḘ

Reputation: 19375

here we go

df['id_sum'] = df.groupby('id')['value'].transform('sum')
df['proportion'] = df['value'] / df['id_sum']

Upvotes: 8

Related Questions