How to fill missing values based on grouped average?

Question

My data has missing values for 'Age' and I want to replace them by average based on groupby column 'Title'. After the command:

df.groupby('Title').mean()['Age']

I get a list for example

Mr 32

Miss 21.7

Ms 28

etc.

I tried:

df['Age'].replace(np.nan, 0, inplace=True)
df[(df.Age==0.0)&(df.Title=='Mr')]

to just see the cells where age is missing and title is of one type but it doesn't work.

Question 1. Why the code above doesn't show any cells, despite multiple cells satisfying both conditions at the same time (age = 0.0 and title is mr)

Question2. How can I replace all missing values based on the group average as described above?

StupidWolf · Accepted Answer

I cannot reproduce the first error, so if i use an example like below:

import pandas as pd
import numpy as np
np.random.seed(111)
df = pd.DataFrame({'Title':np.random.choice(['Mr','Miss','Mrs'],20),'Age':np.random.randint(20,50,20)})
df.loc[[5,9,10,11,12],['Age']]=np.nan

the data frame looks like:

Title   Age
0   Mr  42.0
1   Mr  28.0
2   Mr  25.0
3   Mr  32.0
4   Mrs 26.0
5   Miss    NaN
6   Mrs 32.0
7   Mrs 33.0
8   Mrs 25.0
9   Mr  NaN
10  Miss    NaN
11  Mr  NaN
12  Mrs NaN
13  Miss    38.0
14  Mr  31.0
15  Mr  42.0
16  Mr  24.0
17  Mrs 23.0
18  Mrs 49.0
19  Miss    27.0

And we can replace it just doing one more step:

ave_age = df.groupby('Title').mean()['Age']
df.loc[pd.isna(df['Age']),'Age'] = ave_age[df.loc[pd.isna(df['Age']),'Title']].values

How to fill missing values based on grouped average?

Answers (2)

Related Questions