buddy
buddy

Reputation: 13

compute aggregated mean with group by

I have a dataframe data like this:

Cluster VolumePred      ConversionPred
0   0-3     8.0          7.0
1   0-3     175.0       85.0
2   0-3     17           4.0
3   4-6     14           4.0
4   7-9     29.0        19.0

And I need to add a column "meanKPI" which is equal to the sum of "ConversionPred" divided by the sum of "VolumePred" grouped by "Cluster.

I tried with this:

def KPI_Pred_mean(x, y):
    #print (x)
    return (x.sum()/y.sum())
    
    #data.ConversionPred.sum()/sum_vol_pred
    
df3=data.groupby(['Cluster'])['ConversionPred', 'VolumePred'].apply(KPI_Pred_mean).reset_index() 

But I got an error:

TypeError: KPI_Pred_mean() missing 1 required positional argument: 'y'

How can I fix this?

Upvotes: 0

Views: 39

Answers (2)

Jan Jaap Meijerink
Jan Jaap Meijerink

Reputation: 427

KPI_Pred_mean is expecting two arguments, the way you are giving the function as a lambda to apply can be rewritten as: .apply(lambda x: KPI_Pred_mean(x). Meaning it's missing the y variable. You can rewrite your code in two ways:

1 - rewrite lambda

df3=data.groupby(['Cluster'])['ConversionPred', 'VolumePred'].apply(lambda x: KPI_Pred_mean(x["ConversionPred"], x["volumePred"]).reset_index(name = 'KPI_Pred_mean') 

2 - rewrite your function

def KPI_Pred_mean(row):
    return (row["ConversionPred"].sum()/row["volumePred"].sum())

Number 1 is probably better since it keeps your function nice and generic.

Upvotes: 0

BENY
BENY

Reputation: 323276

Change your call apply self-def function to

out = df.groupby(['Cluster']).apply(lambda x : KPI_Pred_mean(x['ConversionPred'],x['VolumePred'])) .reset_index(name = 'KPI_Pred_mean') 
Out[267]: 
  Cluster  KPI_Pred_mean
0     0-3       0.480000
1     4-6       0.285714
2     7-9       0.655172

Upvotes: 1

Related Questions