Ellie Biessek
Ellie Biessek

Reputation: 158

How to fill missing values based on grouped average?

My data has missing values for 'Age' and I want to replace them by average based on groupby column 'Title'. After the command:

df.groupby('Title').mean()['Age']

I get a list for example

Mr 32

Miss 21.7

Ms 28

etc.

I tried:

df['Age'].replace(np.nan, 0, inplace=True)
df[(df.Age==0.0)&(df.Title=='Mr')]

to just see the cells where age is missing and title is of one type but it doesn't work.

Question 1. Why the code above doesn't show any cells, despite multiple cells satisfying both conditions at the same time (age = 0.0 and title is mr)

Question2. How can I replace all missing values based on the group average as described above?

Upvotes: 1

Views: 211

Answers (2)

StupidWolf
StupidWolf

Reputation: 46908

I cannot reproduce the first error, so if i use an example like below:

import pandas as pd
import numpy as np
np.random.seed(111)
df = pd.DataFrame({'Title':np.random.choice(['Mr','Miss','Mrs'],20),'Age':np.random.randint(20,50,20)})
df.loc[[5,9,10,11,12],['Age']]=np.nan

the data frame looks like:

Title   Age
0   Mr  42.0
1   Mr  28.0
2   Mr  25.0
3   Mr  32.0
4   Mrs 26.0
5   Miss    NaN
6   Mrs 32.0
7   Mrs 33.0
8   Mrs 25.0
9   Mr  NaN
10  Miss    NaN
11  Mr  NaN
12  Mrs NaN
13  Miss    38.0
14  Mr  31.0
15  Mr  42.0
16  Mr  24.0
17  Mrs 23.0
18  Mrs 49.0
19  Miss    27.0

And we can replace it just doing one more step:

ave_age = df.groupby('Title').mean()['Age']
df.loc[pd.isna(df['Age']),'Age'] = ave_age[df.loc[pd.isna(df['Age']),'Title']].values

Upvotes: 1

CtrlMj
CtrlMj

Reputation: 119

Question 1: Please provide a snippet in order to be able to reproduce the error

Question 2:

Try df['Age'].fillna(f.groupby('Title')['Age'].transform('mean')). This is similar to Pandas: filling missing values by mean in each group

Upvotes: 0

Related Questions