SimpleImputer with groupby

Question

Let's suppose the following dataset

	code	category	energy	sugars	proteins
0	01	B	936	NaN	7.8
1	02	NaN	NaN	15.0	NaN
2	03	A	1569.0	23	4.1
3	04	NaN	826	NaN	3
4	05	B	1345	22	5.1
5	06	A	NaN	17	NaN
6	10	C	826	NaN	3
7	11	C	1345	26	5.1
8	101	B	NaN	18	6.1
9	102	B	636	NaN	7.8
10	103	NaN	NaN	15.0	NaN
11	104	A	1569.0	23	4.1
12	105	C	813	NaN	3.5

I would like to make the imputation with SimpleImputer considering the column category.

Namely, I would like to assign the mean considering the product's category.
If the product doesn't have a category, so, I would like to consider the mean of products without category.

So, to complete sugar for code 01. I am only going to consider all sugars of products with category B

	code	category	energy	sugars	proteins
0	01	B	936	NaN	7.8
4	05	B	1345	22	5.1
8	101	B	NaN	18	6.1
9	102	B	636	NaN	7.8

I did something similar, as I show below. But I need to do it with SimpleImputer.
To clarify, in the case below, I completed the NaN without category with the mean of the column.

for col in df.columns:
    if df[col].dtypes == "float64":
        df.loc[df[col].isna() & df["category"].notnull(), col] = df["categories"].map(df.groupby("category")[col].mean())
        df[col].fillna(df[col].mean(), inplace=True)

SimpleImputer with groupby

Answers (1)

Related Questions