Reputation: 3090
I'm digging in the Kaggle's Titanic excercise.
I have a pandas.DataFrame which column 'Age' has some NaN'
values scattered and another column called IsAlone
I created whose values are 1
or 0
depending the person was alone on that ship based on a personal rule.
I'm trying to replace the NaN
values on column Age
for people that were alone with the mean age of those who were alone, and the same way with those who weren't alone. The purpose is just exercise pandas DataFrame, replacing NaN
values based on a rule.
I'm doing this to those who were alone:
df_train[(df_train.IsAlone.astype(bool) & df_train.Age.isnull() )].Age = \
df_train[(df_train.IsAlone.astype(bool) & ~df_train.Age.isnull() )].Age.mean()
And the same way to those who weren't alone:
df_train[(~df_train.IsAlone.astype(bool) & df_train.Age.isnull() )].Age = \
df_train[(~df_train.IsAlone.astype(bool) & ~df_train.Age.isnull() )].Age.mean()
But this is not working at all, the column Age
still have the same NaN
values.
Any thoughts on this?
Upvotes: 1
Views: 703
Reputation: 78011
The problem is that the values are changed on a copy of the original frame. Refer to Returning a view versus a copy for details. As in the documentation:
When setting values in a pandas object, care must be taken to avoid what is called chained indexing.
To change the values on a view of the original frame you may do:
j = df_train.IsAlone.astype(bool) & df_train.Age.isnull()
i = df_train.IsAlone.astype(bool) & ~df_train.Age.isnull()
df_train.loc[j, 'Age'] = df_train.loc[i, 'Age'].mean()
Upvotes: 1