Jason
Jason

Reputation: 333

Pandas groupby - divide by the sum of all groups

I have a DataFrame df and I create gb = df.groupby("column1"). Now I would like to do the following:

x = gb.apply(lambda x: x["column2"].sum() / df["column2"].sum())

It works but I would like to based everytinh on x not x and df. Ideally I expected that there is a function x.get_source_df and then my solution would be:

x = gb.apply(lambda x: x["column2"].sum() / x.get_source_df()["column2"].sum())

and in that case I could save this lambda function in a dictionary which I could use for any df. Is it possible?

Upvotes: 1

Views: 623

Answers (2)

PTQuoc
PTQuoc

Reputation: 1083

I am not sure in your explanation that you want to divide for the sum of each group or divide for the sum of the entire database. I assume what you want is to divide the sum of each group.

Data:

df = pd.DataFrame({'name':['a']*5+['b']*5,
                   'year':[2001,2002,2003,2004,2005]*2,
                   'val1':[1,2,3,4,5,None,7,8,9,10],
                   'val2':[21,22,23,24,25,26,27,28,29,30]})

Using transform then simply divide col by col:

df['sum'] = df.groupby('name')['val1'].transform(lambda g: g.sum())
df['weight'] = df['val1']/df['sum']

Upvotes: 1

ansev
ansev

Reputation: 30940

you should not use apply here, may be you find it interesting, optimal method would be

df.groupby('column1')['column2'].sum().div(df['column2'].sum())

It works for more than one column too.

Upvotes: 1

Related Questions