k88074
k88074

Reputation: 2164

How to use groupby max in own groupby function?

I have the following df

d = {'CAT':['C1','C2','C1','C2'],'A': [10, 20,30,40], 'B': [3, 4,10,3]}
df1 = pd.DataFrame(data=d)

I am trying to include a new column obtained by dividing 'A' by the highest 'B' it is category ('CAT'). That is, I want to divide 10 by 10, 20 by 4, 10 by 10 and 40 by 4 to obtain the following df

d = {'CAT':['C1','C2','C1','C2'],'A': [10, 20,30,40], 'B': [3, 4,10,3], 'C':[1,5,3,10]}

Any suggestions?



I find it easy to do without having to condition/groupby on CAT

d = {'A': [10, 20,30,40], 'B': [3, 4,10,3]}
df1 = pd.DataFrame(data=d)
df1 = df1.apply(lambda x:x.A/max(df1['B']),axis=1)

but with 'CAT' I am having a hard time.

Upvotes: 2

Views: 49

Answers (2)

houseofleft
houseofleft

Reputation: 447

you're pretty much most of the way there with using apply. Depending on how big your actual dataset it, using apply could work out as inefficient, but ignoring that, you can solve your problem by the 'max' function on a filter of the dataframe rather than the df itself.

Or, just to get to the code:

df1['calculation'] = df1.apply(lambda row: row['A'] / max(df1[df1['CAT'] == row['CAT']]['B']), axis=1)

Upvotes: 0

sammywemmy
sammywemmy

Reputation: 28699

You could do this in one line; I only broke it into separate lines for more clarity. transform allows replication of the groupby accross the entire dataframe; with that we can get the results for column C :

grouping = df1.groupby("CAT").B.transform("max")
df1['C'] = df1.A.div(grouping)
df1



    CAT A   B   C
0   C1  10  3   1.0
1   C2  20  4   5.0
2   C1  30  10  3.0
3   C2  40  3   10.0

Upvotes: 1

Related Questions