StatsScared
StatsScared

Reputation: 537

Calculating geometric mean by group

I have a dataset containing four columns similar to the first four columns shown below. I want to add another column that shows the geometric mean of the values in 'price' for certain groups, where each group is determined by the column 'type'.

How can I do this? The result would be a column like the one labeled 'geomean_price_bytype' below.

enter image description here

Upvotes: 3

Views: 1167

Answers (2)

not_overrated
not_overrated

Reputation: 115

So basically you have three groups. You want to create a new column based on this group.

def meanByGroup(x):
    if x == 111:
        return 245474
    elif x == 222:
        return 194223
    elif x == 333:
        return 124122

Then df["geomean_price_bytype"] = df["type"].apply(meanByGroup)

Upvotes: 0

jezrael
jezrael

Reputation: 862681

Use GroupBy.transform with gmean:

from scipy.stats.mstats import gmean

#if necessary remove `,` and `$`
#df['price'] = df['price'].str.lstrip('$').str.replace(',', '').astype(int)


df['new'] = df.groupby('type')['price'].transform(gmean)

Or custom lambda function:

gmean1 = lambda x: x.product() ** (1 / float(len(x)))
df['new'] = df.groupby('type')['price'].transform(gmean1)

Upvotes: 3

Related Questions