Reputation: 141
I have a Pandas Dataframe like this
Age | Gender | City |
---|---|---|
10000 | Male | Tokyo |
15000 | Male | Tokyo |
20000 | Male | Tokyo |
12000 | Female | Madrid |
14000 | Female | Madrid |
16000 | Female | Madrid |
15000 | Female | Rome |
NaN | Female | Rome |
NaN | Male | Tokyo |
NaN | Female | Rome |
Those 3 last rows I'd like to input the median based on the gender and city. For example, for the Female in Rome that has NaN value, it would be 15000 because of the only one female of Rome that has 15000.
For the male with Nan values and from Tokyo, it would be 15000 because it is the median of the male of Tokyo.
I know I can fill with the median of the column df['Age'] = df['Age'].fillna(median)
, but I want to calculate it using the other categorial columns too.
Maybe something like this?
df['Age'] = df['Age].finnla(df[['Age','Gender','City']].groupby(by=['Gender','City']).median())
How can I do this?
Appreciate ur help
Upvotes: 1
Views: 2097
Reputation: 120409
Another way:
df['Age'] = df.groupby(['Gender', 'City'])['Age'].apply(lambda x: x.fillna(x.median()))
print(df)
# Output
Age Gender City
0 10000.0 Male Tokyo
1 15000.0 Male Tokyo
2 20000.0 Male Tokyo
3 12000.0 Female Madrid
4 14000.0 Female Madrid
5 16000.0 Female Madrid
6 15000.0 Female Rome
7 15000.0 Female Rome
8 15000.0 Male Tokyo
9 15000.0 Female Rome
Upvotes: 1
Reputation: 323226
Let us try groupby
with transform
, also please make sure it is NaN
not 'Nan'
df.Age.fillna(df.groupby(['Gender','City'])['Age'].transform('median'),inplace=True)
df
Out[628]:
Age Gender City
0 10000.0 Male Tokyo
1 15000.0 Male Tokyo
2 20000.0 Male Tokyo
3 12000.0 Female Madrid
4 14000.0 Female Madrid
5 16000.0 Female Madrid
6 15000.0 Female Rome
7 15000.0 Female Rome
8 15000.0 Male Tokyo
9 15000.0 Female Rome
Upvotes: 3