mateusvl
mateusvl

Reputation: 141

How can I fill NaN in Pandas Dataframe with median of each category?

I have a Pandas Dataframe like this

Age Gender City
10000 Male Tokyo
15000 Male Tokyo
20000 Male Tokyo
12000 Female Madrid
14000 Female Madrid
16000 Female Madrid
15000 Female Rome
NaN Female Rome
NaN Male Tokyo
NaN Female Rome

Those 3 last rows I'd like to input the median based on the gender and city. For example, for the Female in Rome that has NaN value, it would be 15000 because of the only one female of Rome that has 15000.

For the male with Nan values and from Tokyo, it would be 15000 because it is the median of the male of Tokyo.

I know I can fill with the median of the column df['Age'] = df['Age'].fillna(median), but I want to calculate it using the other categorial columns too.

Maybe something like this?

df['Age'] = df['Age].finnla(df[['Age','Gender','City']].groupby(by=['Gender','City']).median())

How can I do this?

Appreciate ur help

Upvotes: 1

Views: 2097

Answers (2)

Corralien
Corralien

Reputation: 120409

Another way:

df['Age'] = df.groupby(['Gender', 'City'])['Age'].apply(lambda x: x.fillna(x.median()))
print(df)

# Output
       Age  Gender    City
0  10000.0    Male   Tokyo
1  15000.0    Male   Tokyo
2  20000.0    Male   Tokyo
3  12000.0  Female  Madrid
4  14000.0  Female  Madrid
5  16000.0  Female  Madrid
6  15000.0  Female    Rome
7  15000.0  Female    Rome
8  15000.0    Male   Tokyo
9  15000.0  Female    Rome

Upvotes: 1

BENY
BENY

Reputation: 323226

Let us try groupby with transform, also please make sure it is NaN not 'Nan'

df.Age.fillna(df.groupby(['Gender','City'])['Age'].transform('median'),inplace=True)
df
Out[628]: 
       Age  Gender    City
0  10000.0    Male   Tokyo
1  15000.0    Male   Tokyo
2  20000.0    Male   Tokyo
3  12000.0  Female  Madrid
4  14000.0  Female  Madrid
5  16000.0  Female  Madrid
6  15000.0  Female    Rome
7  15000.0  Female    Rome
8  15000.0    Male   Tokyo
9  15000.0  Female    Rome

Upvotes: 3

Related Questions