Clock Slave
Clock Slave

Reputation: 7967

Pandas set values of multiple rows of a column

I have a dataframe(train) that has an Age column in it. This column has missing values. I have merged it with another dataframe- static_values which also has an Age column. I am using the below lines to substitute the missing values for the Age column in train df.

predicted_vals = pd.merge(static_vals, train, on=['Pclass','Sex'])
# num of missing values
predicted_vals.loc[predicted_vals['Age_y'].isna(),'Age_y'].isna().sum() # 177
predicted_vals.loc[predicted_vals['Age_y'].isna(),'Age_y'] = predicted_vals.loc[predicted_vals['Age_y'].isna(),'Age_x']

After running the above lines, I run the following to see if the values have been substituted-

predicted_vals.loc[predicted_vals['Age_y'].isna(),'Age_y']

And this is the putput I get -

Series([], Name: Age_x, dtype: float64)

Its empty. No assignment has happened. The strange part is that when I check the values for the Age_x column after running the above lines, I get a blank there too.

>>> predicted_vals.loc[predicted_vals['Age_y'].isna(),'Age_x']
Series([], Name: Age_x, dtype: float64)

Below is what the column holds right before I run the lines where I am trying to assign the missing values

>>> predicted_vals.loc[predicted_vals['Age_y'].isna(),'Age_x'].head()
3     34.240964
8     34.240964
15    34.240964
25    34.240964
34    34.240964

I searched here for similar questions here but all deal with assigning a single value to many rows. I can't figure what's wrong here. Any help?

Upvotes: 1

Views: 4937

Answers (3)

jezrael
jezrael

Reputation: 862661

I think you need combine_first:

predicted_vals['Age_y'] = predicted_vals['Age_y'].combine_first(predicted_vals['Age_x'])

Upvotes: 1

Stev
Stev

Reputation: 1140

Is there actually a problem here? predicted_vals.loc[predicted_vals['Age_y'].isna(),'Age_y'] should be empty because you have filled the values! Try predicted_vals.loc[~predicted_vals['Age_y'].isna(),'Age_y']

Upvotes: 3

jpp
jpp

Reputation: 164673

This is an alternative solution, which avoids merging and handling column name suffixes. We align the 2 indices and use fillna to map from static_vals.

predicted_vals = predicted_vals.set_index(['Pclass','Sex'])

predicted_vals['Age'] = predicted_vals['Age'].fillna(static_vals.set_index(['Pclass','Sex'])['Age'])

predicted_vals = predicted_vals.reset_index()

If you would like to do an explicit merge, @jezrael's solution is the way to go.

Upvotes: 1

Related Questions