Code Optimization for groupby

Question

I have the below code that basically performs a group by operation, followed by a sum.

grouped = df.groupby(by=['Cabin'], as_index=False)['Fare'].sum()

I then rename the columns

grouped.columns = ['Cabin', 'testCol']

And I then merge the "grouped" dataframe with my original dataframe to calculate aggregate.

df2 = df.merge(grouped, on='Cabin')

What this does is to populate my initial dataframe with the 'testCol' from my "grouped" dataframe.

Can this code be optimized to fit in one line or something similar?

jezrael · Accepted Answer

It seems need GroupBy.transform for new column of sums:

df['testCol'] = df.groupby('Cabin')['Fare'].transform('sum')

Answers (1)