Reputation: 158
My data has missing values for 'Age' and I want to replace them by average based on groupby column 'Title'. After the command:
df.groupby('Title').mean()['Age']
I get a list for example
Mr 32
Miss 21.7
Ms 28
etc.
I tried:
df['Age'].replace(np.nan, 0, inplace=True)
df[(df.Age==0.0)&(df.Title=='Mr')]
to just see the cells where age is missing and title is of one type but it doesn't work.
Question 1. Why the code above doesn't show any cells, despite multiple cells satisfying both conditions at the same time (age = 0.0 and title is mr)
Question2. How can I replace all missing values based on the group average as described above?
Upvotes: 1
Views: 211
Reputation: 46908
I cannot reproduce the first error, so if i use an example like below:
import pandas as pd
import numpy as np
np.random.seed(111)
df = pd.DataFrame({'Title':np.random.choice(['Mr','Miss','Mrs'],20),'Age':np.random.randint(20,50,20)})
df.loc[[5,9,10,11,12],['Age']]=np.nan
the data frame looks like:
Title Age
0 Mr 42.0
1 Mr 28.0
2 Mr 25.0
3 Mr 32.0
4 Mrs 26.0
5 Miss NaN
6 Mrs 32.0
7 Mrs 33.0
8 Mrs 25.0
9 Mr NaN
10 Miss NaN
11 Mr NaN
12 Mrs NaN
13 Miss 38.0
14 Mr 31.0
15 Mr 42.0
16 Mr 24.0
17 Mrs 23.0
18 Mrs 49.0
19 Miss 27.0
And we can replace it just doing one more step:
ave_age = df.groupby('Title').mean()['Age']
df.loc[pd.isna(df['Age']),'Age'] = ave_age[df.loc[pd.isna(df['Age']),'Title']].values
Upvotes: 1
Reputation: 119
Question 1: Please provide a snippet in order to be able to reproduce the error
Question 2:
Try df['Age'].fillna(f.groupby('Title')['Age'].transform('mean'))
. This is similar to Pandas: filling missing values by mean in each group
Upvotes: 0