Reputation: 537
I have a dataset containing four columns similar to the first four columns shown below. I want to add another column that shows the geometric mean of the values in 'price' for certain groups, where each group is determined by the column 'type'.
How can I do this? The result would be a column like the one labeled 'geomean_price_bytype' below.
Upvotes: 3
Views: 1167
Reputation: 115
So basically you have three groups. You want to create a new column based on this group.
def meanByGroup(x):
if x == 111:
return 245474
elif x == 222:
return 194223
elif x == 333:
return 124122
Then df["geomean_price_bytype"] = df["type"].apply(meanByGroup)
Upvotes: 0
Reputation: 862681
Use GroupBy.transform
with gmean
:
from scipy.stats.mstats import gmean
#if necessary remove `,` and `$`
#df['price'] = df['price'].str.lstrip('$').str.replace(',', '').astype(int)
df['new'] = df.groupby('type')['price'].transform(gmean)
Or custom lambda function:
gmean1 = lambda x: x.product() ** (1 / float(len(x)))
df['new'] = df.groupby('type')['price'].transform(gmean1)
Upvotes: 3