Reputation: 7967
I have a dataframe(train
) that has an Age
column in it. This column has missing values. I have merged it with another dataframe- static_values
which also has an Age
column. I am using the below lines to substitute the missing values for the Age
column in train
df.
predicted_vals = pd.merge(static_vals, train, on=['Pclass','Sex'])
# num of missing values
predicted_vals.loc[predicted_vals['Age_y'].isna(),'Age_y'].isna().sum() # 177
predicted_vals.loc[predicted_vals['Age_y'].isna(),'Age_y'] = predicted_vals.loc[predicted_vals['Age_y'].isna(),'Age_x']
After running the above lines, I run the following to see if the values have been substituted-
predicted_vals.loc[predicted_vals['Age_y'].isna(),'Age_y']
And this is the putput I get -
Series([], Name: Age_x, dtype: float64)
Its empty. No assignment has happened. The strange part is that when I check the values for the Age_x
column after running the above lines, I get a blank there too.
>>> predicted_vals.loc[predicted_vals['Age_y'].isna(),'Age_x']
Series([], Name: Age_x, dtype: float64)
Below is what the column holds right before I run the lines where I am trying to assign the missing values
>>> predicted_vals.loc[predicted_vals['Age_y'].isna(),'Age_x'].head()
3 34.240964
8 34.240964
15 34.240964
25 34.240964
34 34.240964
I searched here for similar questions here but all deal with assigning a single value to many rows. I can't figure what's wrong here. Any help?
Upvotes: 1
Views: 4937
Reputation: 862661
I think you need combine_first
:
predicted_vals['Age_y'] = predicted_vals['Age_y'].combine_first(predicted_vals['Age_x'])
Upvotes: 1
Reputation: 1140
Is there actually a problem here?
predicted_vals.loc[predicted_vals['Age_y'].isna(),'Age_y']
should be empty because you have filled the values! Try predicted_vals.loc[~predicted_vals['Age_y'].isna(),'Age_y']
Upvotes: 3
Reputation: 164673
This is an alternative solution, which avoids merging and handling column name suffixes. We align the 2 indices and use fillna
to map from static_vals
.
predicted_vals = predicted_vals.set_index(['Pclass','Sex'])
predicted_vals['Age'] = predicted_vals['Age'].fillna(static_vals.set_index(['Pclass','Sex'])['Age'])
predicted_vals = predicted_vals.reset_index()
If you would like to do an explicit merge, @jezrael's solution is the way to go.
Upvotes: 1