Reputation: 677
I have a dataset containing heights, weights etc, and I intend to fill the NaN values with the mean value for that gender.
Example dataset:
gender height weight
1 M 5 NaN
2 F 4 NaN
3 F NaN 40
4 M NaN 50
df = df.groupby("Gender").transform(lambda x: x.fillna(x.mean()))
current output:
height weight
1 5 50
2 4 40
3 4 40
4 5 50
Expected output:
gender height weight
1 M 5 50
2 F 4 40
3 F 4 40
4 M 5 50
Unfortunately this drops the column Gender which is important later on.
Upvotes: 1
Views: 210
Reputation: 13841
How about looping through the 2 columns you want to fill, and perform GroupBy.transform
, grouping by 'gender':
for col in ['height','weight']:
df[col] = df.groupby('gender')[col].transform(lambda x: x.fillna(x.mean()))
print(df)
gender height weight
0 M 5.0 50.0
1 F 4.0 40.0
2 F 4.0 40.0
3 M 5.0 50.0
If you want to fill all the numerical columns, you can get them in a list
, and perform the same approach:
features_to_impute = [
x for x in df.columns if df[x].dtypes != 'O' and df[x].isnull().mean() > 0
]
for col in features_to_impute:
df[col] = df.groupby('gender')[col].transform(lambda x: x.fillna(x.mean()))
Upvotes: 1
Reputation: 1624
Instead of using groupby, you can reach your expected output like below:
df = df.groupby('gender').apply(lambda x: x.fillna(x.mean()))
Upvotes: 0