Blaine
Blaine

Reputation: 677

How to fill NaN based on groupby transform without loosing the column grouped by?

I have a dataset containing heights, weights etc, and I intend to fill the NaN values with the mean value for that gender.

Example dataset:

    gender    height    weight
1     M          5       NaN
2     F          4       NaN
3     F         NaN        40
4     M         NaN        50
df = df.groupby("Gender").transform(lambda x: x.fillna(x.mean()))

current output:

     height    weight
1       5        50
2       4        40
3       4        40
4       5        50

Expected output:

    gender    height    weight
1     M          5        50
2     F          4        40
3     F          4        40
4     M          5        50

Unfortunately this drops the column Gender which is important later on.

Upvotes: 1

Views: 210

Answers (2)

sophocles
sophocles

Reputation: 13841

How about looping through the 2 columns you want to fill, and perform GroupBy.transform, grouping by 'gender':

for col in ['height','weight']:
    df[col] = df.groupby('gender')[col].transform(lambda x: x.fillna(x.mean()))

print(df)

  gender  height  weight
0      M     5.0    50.0
1      F     4.0    40.0
2      F     4.0    40.0
3      M     5.0    50.0

If you want to fill all the numerical columns, you can get them in a list, and perform the same approach:

features_to_impute = [
        x for x in df.columns if df[x].dtypes != 'O' and df[x].isnull().mean() > 0
        ]

for col in features_to_impute:
    df[col] = df.groupby('gender')[col].transform(lambda x: x.fillna(x.mean()))

Upvotes: 1

ashkangh
ashkangh

Reputation: 1624

Instead of using groupby, you can reach your expected output like below:

 df = df.groupby('gender').apply(lambda x: x.fillna(x.mean()))

Upvotes: 0

Related Questions