Manipulate A Group Column in Pandas

Question

I have a data set with columns Dist, Class, and Count.

I want to group that data set by dist and divide the count column of each group by the sum of the counts for that group (normalize it to one).

The following MWE demonstrates my approach thus far. But I wonder: is there a more compact/pandaific way of writing this?

import pandas as pd
import numpy as np

a = np.random.randint(0,4,(10,3))
s = pd.DataFrame(a,columns=['Dist','Class','Count'])

def manipcolumn(x):
    csum = x['Count'].sum()
    x['Count'] = x['Count'].apply(lambda x: x/csum)
    return x

s.groupby('Dist').apply(manipcolumn)

Manipulate A Group Column in Pandas

Answers (1)

Related Questions