Reputation: 5210
I have a dataframe that contains three columns
df = pd.DataFrame({'col1': ['a','a','b','b'], 'col2':['a','b','c','d'], 'col3': [1,2,3,4]})
I would like to get the proportion of the third column in the group defined by the first column, but I would also like to carry over the values from the second column, so I can get something like this:
col1 col2 col3
0 a a 0.333333
1 a b 0.666667
2 b c 0.428571
3 b d 0.571429
I can do the proportion using group/apply:
df.groupby('col1').apply(lambda l: l.col3 / l.col3.sum()).reset_index()
col1 level_1 col3
0 a 0 0.333333
1 a 1 0.666667
2 b 2 0.428571
3 b 3 0.571429
But not sure how to include the second column.
Upvotes: 0
Views: 317
Reputation: 353309
I'm not sure exactly what "carry over the values from the second column" means, but IIUC you just don't want those values to be missing from your final output. In that case, don't get rid of them:
>>> df["col3"] = df["col3"] / df.groupby("col1")["col3"].transform(sum)
>>> df
col1 col2 col3
0 a a 0.333333
1 a b 0.666667
2 b c 0.428571
3 b d 0.571429
Here we've used transform
, which means "perform a groupby operation and then broadcast the result back up to the original index":
>>> df.groupby("col1")["col3"].transform(sum)
0 3
1 3
2 7
3 7
dtype: int64
This gives us the right denominators to divide by.
Upvotes: 2