Reputation: 2164
I have the following df
d = {'CAT':['C1','C2','C1','C2'],'A': [10, 20,30,40], 'B': [3, 4,10,3]}
df1 = pd.DataFrame(data=d)
I am trying to include a new column obtained by dividing 'A' by the highest 'B' it is category ('CAT'). That is, I want to divide 10 by 10, 20 by 4, 10 by 10 and 40 by 4 to obtain the following df
d = {'CAT':['C1','C2','C1','C2'],'A': [10, 20,30,40], 'B': [3, 4,10,3], 'C':[1,5,3,10]}
Any suggestions?
I find it easy to do without having to condition/groupby on CAT
d = {'A': [10, 20,30,40], 'B': [3, 4,10,3]}
df1 = pd.DataFrame(data=d)
df1 = df1.apply(lambda x:x.A/max(df1['B']),axis=1)
but with 'CAT' I am having a hard time.
Upvotes: 2
Views: 49
Reputation: 447
you're pretty much most of the way there with using apply. Depending on how big your actual dataset it, using apply could work out as inefficient, but ignoring that, you can solve your problem by the 'max' function on a filter of the dataframe rather than the df itself.
Or, just to get to the code:
df1['calculation'] = df1.apply(lambda row: row['A'] / max(df1[df1['CAT'] == row['CAT']]['B']), axis=1)
Upvotes: 0
Reputation: 28699
You could do this in one line; I only broke it into separate lines for more clarity. transform allows replication of the groupby accross the entire dataframe; with that we can get the results for column C :
grouping = df1.groupby("CAT").B.transform("max")
df1['C'] = df1.A.div(grouping)
df1
CAT A B C
0 C1 10 3 1.0
1 C2 20 4 5.0
2 C1 30 10 3.0
3 C2 40 3 10.0
Upvotes: 1