Reputation: 815
I have the below code that basically performs a group by operation, followed by a sum.
grouped = df.groupby(by=['Cabin'], as_index=False)['Fare'].sum()
I then rename the columns
grouped.columns = ['Cabin', 'testCol']
And I then merge the "grouped" dataframe with my original dataframe to calculate aggregate.
df2 = df.merge(grouped, on='Cabin')
What this does is to populate my initial dataframe with the 'testCol' from my "grouped" dataframe.
Can this code be optimized to fit in one line or something similar?
Upvotes: 1
Views: 93
Reputation: 862611
It seems need GroupBy.transform
for new column of sum
s:
df['testCol'] = df.groupby('Cabin')['Fare'].transform('sum')
Upvotes: 1