Reputation: 1063
I've tried several ideas posted on the forums but none is quite working. I have a dataframe of product identifiers and prices. I have already narrowed the df to only where the same product has more than one price in the larger database. Now I want to create a new column that will be the average price of a given product. I.e.:
ID Price
ABC1 101.45
XYZ2 88.12
ABC1 99.24
XYZ2 82.99
ABC1 105.00
The output I want is as such:
ID Price AvgPx
ABC1 101.45 101.897
XYZ2 88.12 85.556
ABC1 99.24 101.897
XYZ2 82.99 85.556
ABC1 105.00 101.897
I've tried various versions of groupby and for loops and nothing quite works. Thanks for your help!
Upvotes: 1
Views: 3738
Reputation: 51395
While the other solutions offered work great, I would argue that using transform
here leads to nice clean easy to read code:
df['AvgPx'] = df.groupby('ID')['Price'].transform('mean')
>>> df
ID Price AvgPx
0 ABC1 101.45 101.896667
1 XYZ2 88.12 85.555000
2 ABC1 99.24 101.896667
3 XYZ2 82.99 85.555000
4 ABC1 105.00 101.896667
Upvotes: 5
Reputation: 249394
You can do this:
avg = df.groupby('ID').Price.mean()
df.join(avg, on='ID', rsuffix='Avg')
It would be nicer to say df['AvgPx'] = avg.reindex(df.ID)
, but that doesn't work because reindex()
requires a unique series.
Upvotes: 2
Reputation: 555
You can create an aggregate version of the dataframe, then use merge to join your original dataframe with your aggregate.
agg_df = df.groupby('ID', as_index=False)['Price'].mean().rename(columns={'Price': 'AvgPx'})
df = df.merge(agg_df)
ID Price AvgPx
0 ABC1 101.45 101.896667
1 ABC1 99.24 101.896667
2 ABC1 105.00 101.896667
3 XYZ2 88.12 85.555000
4 XYZ2 82.99 85.555000
Upvotes: 2