Reputation: 77
I have this data
import numpy as np
import pandas as pd
group = {'gender': ['male', 'female', 'female', 'male', 'female', 'male', 'male'],
'height': [175, 168, np.nan, 170, 167, np.nan, 190],
}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
df = pd.DataFrame(group, index=labels)
df2 = df.groupby('gender')['height'].mean()
and i want to fill nan with mean value from df2
Upvotes: 3
Views: 1217
Reputation: 5334
code
import pandas as pd
import numpy as np
group = {'gender': ['male', 'female', 'female', 'male', 'female', 'male', 'male'],
'height': [175, 168, np.nan, 170, 167, np.nan, 190],
}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
df = pd.DataFrame(group, index=labels)
df2 = df.groupby('gender')['height'].mean()
df['height'].fillna(df['gender'].map(df2), inplace=True)
# print(df2)
print(df)
output
gender height
a male 175.000000
b female 168.000000
c female 167.500000
d male 170.000000
e female 167.000000
f male 178.333333
g male 190.000000
Upvotes: 3
Reputation: 164673
You can use groupby
+ transform
with mean
. Then fillna
with the resulting series.
means = df.groupby('gender')['height'].transform('mean')
df['height'] = df['height'].fillna(means)
print(df)
gender height
a male 175.000000
b female 168.000000
c female 167.500000
d male 170.000000
e female 167.000000
f male 178.333333
g male 190.000000
Upvotes: 3