Reputation: 587
I have a dataframe which has a name and size column. I need to create a new column (size_total) that is the total of all unique size values grouped by the name column.
df = pd.DataFrame([
{'sp': 'A', 'size': 25},
{'sp': 'B', 'size': 26},
{'sp': 'C', 'size': 27},
{'sp': 'A', 'size': 25},
{'sp': 'B', 'size': 42},
{'sp': 'C', 'size': 27},
{'sp': 'A', 'size': 25},
{'sp': 'B', 'size': 32},
{'sp': 'C', 'size': 33}])
df['size_total'] = df['sp'].map(df[['sp', 'size']].drop_duplicates(keep='first').groupby('sp').sum()['size'])
Is there a more efficient way to do this?
Upvotes: 3
Views: 73
Reputation: 402553
If you set "sp" as the index first, you can shave off ~30% in timings from your original solution by making pandas do the mapping.
df.set_index(df['sp'].values, inplace=True)
df['size_total'] = (
df.drop_duplicates(['sp', 'size']).groupby('sp')['size'].sum(level=0))
Upvotes: 1