Pandas group proportion with additional columns

Question

I have a dataframe that contains three columns

df = pd.DataFrame({'col1': ['a','a','b','b'], 'col2':['a','b','c','d'], 'col3': [1,2,3,4]})

I would like to get the proportion of the third column in the group defined by the first column, but I would also like to carry over the values from the second column, so I can get something like this:

  col1     col2      col3
0    a        a  0.333333
1    a        b  0.666667
2    b        c  0.428571
3    b        d  0.571429

I can do the proportion using group/apply:

df.groupby('col1').apply(lambda l: l.col3 / l.col3.sum()).reset_index()
  col1  level_1      col3
0    a        0  0.333333
1    a        1  0.666667
2    b        2  0.428571
3    b        3  0.571429

But not sure how to include the second column.

DSM · Accepted Answer

I'm not sure exactly what "carry over the values from the second column" means, but IIUC you just don't want those values to be missing from your final output. In that case, don't get rid of them:

>>> df["col3"] = df["col3"] / df.groupby("col1")["col3"].transform(sum)
>>> df
  col1 col2      col3
0    a    a  0.333333
1    a    b  0.666667
2    b    c  0.428571
3    b    d  0.571429

Here we've used transform, which means "perform a groupby operation and then broadcast the result back up to the original index":

>>> df.groupby("col1")["col3"].transform(sum)
0    3
1    3
2    7
3    7
dtype: int64

This gives us the right denominators to divide by.

Pandas group proportion with additional columns

Answers (1)

Related Questions