Reputation: 2587
I am working around with the "Titanic" dataset on Kaggle and want to fill the NaNs
in Age
column with values from a dataframe df
which also has the some common fields as my original dataframe Orig
.
I want only Sex, Pclass,Title
columns to be compared for filling NaNs.
The contents of df are:
Sex Pclass Title Survived Age Fare
0 0 1 Miss 0.958333 29.744681 121.694356
1 0 1 Mrs 0.976744 40.400000 92.615505
2 0 1 Officer 1.000000 49.000000 25.929200
3 0 1 Royalty 1.000000 40.500000 63.050000
4 0 2 Miss 0.941176 22.390625 20.393750
5 0 2 Mrs 0.904762 33.547619 23.246231
6 0 3 Miss 0.500000 16.123188 15.248043
7 0 3 Mrs 0.500000 33.515152 18.233529
8 1 1 Master 1.000000 5.306667 117.802767
9 1 1 Mr 0.345794 41.580460 68.034385
10 1 1 Officer 0.444444 51.125000 53.761111
11 1 1 Royalty 0.333333 42.333333 28.216667
12 1 2 Master 1.000000 2.258889 27.306022
13 1 2 Mr 0.087912 32.768293 19.252564
14 1 2 Officer 0.000000 42.000000 16.796875
15 1 3 Master 0.392857 5.350833 28.177232
16 1 3 Mr 0.112853 28.724891 11.299762
whereas Orig
is :
How shall I fill the NaNs
in Age
column in Orig
using df
?
Upvotes: 0
Views: 115
Reputation: 1726
Creating a list of columns of interest :
cols = [
*orig.columns.difference(df.columns),'Sex_x','Pclass_x',
'Title_x', 'Survived_x', 'Age_x', 'Age_y', 'Fare'
]
Doing a left merge (join) on columns Sex
, PClass
and Title
and filling the NaN
values with values from Age_y
:
orig = orig.merge(df, on=['Sex', 'PClass', 'Title'], how='left')[cols]
orig['Age'] = df['Age_x'].fillna(df['Age_y'])
Dropping the now redundant Age_x
and Age_y
columns :
orig.drop(columns=['Age_x', 'Age_y'], inplace=True)
Upvotes: 2