Unable to calculate the aggregated mean

Question

I have a dataset in this form:

Customer_key  purcahse_amount   Date
12633           4435           08/07/2021
34243           7344           11/11/2021
54355           4642           10/11/2020
12633           6322           11/12/2021

Purchase_amount has few Nan values.... I want to replace the Nan values with mean of purcahse_amount but mean should be calculated for that particlar customer_key only. For example, If you see customer_key=12633 is repeating in row 1 and row 4. So, for any row with customer_key=12633, if there is a missing value of purcahse_amount, then it be replaced with mean of that purchase_amount of all rows where customer_key=12633.

user7864386 · Accepted Answer

You can groupby "Customer_key" and then compute the mean of "purchase_amount" and transform it for the DataFrame (we need to transform it to use it in np.where where the values to choose from must be broadcastable). Note that mean method skips NaN values by default, so it transforms the mean of non-NaN values.

Then using np.where, depending on if a "purchase_amount" is NaN or not fill in with group-specific mean or keep original.

means = df.groupby('Customer_key')['purchase_amount'].transform('mean')
df['purchase_amount'] = np.where(df['purchase_amount'].isna(), means, df['purchase_amount'])

or you can use fillna:

df['purchase_amount'] = df['purchase_amount'].fillna(means)

For example, if you had df as below:

   Customer_key  purchase_amount        Date
0         12633           4435.0  08/07/2021
1         34243           7344.0  11/11/2021
2         54355           4642.0  10/11/2020
3         12633           6322.0  11/12/2021
3         12633              NaN  11/12/2021

the both of the above options produce:

   Customer_key  purchase_amount        Date
0         12633           4435.0  08/07/2021
1         34243           7344.0  11/11/2021
2         54355           4642.0  10/11/2020
3         12633           6322.0  11/12/2021
3         12633           5378.5  11/12/2021

Unable to calculate the aggregated mean

Answers (1)

Related Questions