asn
asn

Reputation: 2587

Filling NaNs with values from column of another dataframe by comparing the columns

I am working around with the "Titanic" dataset on Kaggle and want to fill the NaNs in Age column with values from a dataframe df which also has the some common fields as my original dataframe Orig.

I want only Sex, Pclass,Title columns to be compared for filling NaNs.

The contents of df are:

  Sex Pclass Title  Survived       Age      Fare
0   0   1   Miss    0.958333    29.744681   121.694356
1   0   1   Mrs     0.976744    40.400000   92.615505
2   0   1   Officer 1.000000    49.000000   25.929200
3   0   1   Royalty 1.000000    40.500000   63.050000
4   0   2   Miss    0.941176    22.390625   20.393750
5   0   2   Mrs     0.904762    33.547619   23.246231
6   0   3   Miss    0.500000    16.123188   15.248043
7   0   3   Mrs     0.500000    33.515152   18.233529
8   1   1   Master  1.000000    5.306667    117.802767
9   1   1   Mr      0.345794    41.580460   68.034385
10  1   1   Officer 0.444444    51.125000   53.761111
11  1   1   Royalty 0.333333    42.333333   28.216667
12  1   2   Master  1.000000    2.258889    27.306022
13  1   2   Mr      0.087912    32.768293   19.252564
14  1   2   Officer 0.000000    42.000000   16.796875
15  1   3   Master  0.392857    5.350833    28.177232
16  1   3   Mr      0.112853    28.724891   11.299762

whereas Orig is :

enter image description here

How shall I fill the NaNs in Age column in Orig using df ?

Upvotes: 0

Views: 115

Answers (1)

Vishnu Kunchur
Vishnu Kunchur

Reputation: 1726

Creating a list of columns of interest :

cols = [
    *orig.columns.difference(df.columns),'Sex_x','Pclass_x',
    'Title_x', 'Survived_x', 'Age_x', 'Age_y', 'Fare'
]
Doing a left merge (join) on columns Sex, PClass and Title and filling the NaN values with values from Age_y :
  orig = orig.merge(df, on=['Sex', 'PClass', 'Title'], how='left')[cols]
    orig['Age'] = df['Age_x'].fillna(df['Age_y'])
Dropping the now redundant Age_x and Age_y columns :
orig.drop(columns=['Age_x', 'Age_y'], inplace=True)

Upvotes: 2

Related Questions