Reputation: 47
I have a dataframe (df) in Python with a few features but I'm going to work with Age
and Age_Mean
columns.
In Age
column, there are several null
values. I would like to replace those null values with the same index from Age_Mean
column.
Here is the code I used:
for i in df:
if df['Age'].isnull().iloc[i] == True:
df['Age'].iloc[i] == df['Age_Mean'].iloc[i]
This is my error message:
KeyError: 'the label [Age] is not in the [index]'
Please let me know what is wrong with this code.
Upvotes: 2
Views: 82
Reputation: 4264
The statement for i in df
will iterate through the column name. Let's take an example to understand this better:
df = pd.DataFrame({"Age":np.array([2,3,np.nan,8,np.nan]),"Age_mean":np.array([2,5,9,2,1])})
df
so the data frame will look like this:
Age Age_mean
0 2.0 2
1 3.0 5
2 NaN 9
3 8.0 2
4 NaN 1
Now lets see what the for loop will iterate over:
for i in df:
print(i)
OUTPUT
Age
Age_mean
And now when you try to execute df['Age'].isnull().iloc[i]
it is going to throw an error because the value of i
will be Age
in this case.
PROPOSED SOLUTION:
We can do this without a for loop as shown below:
nan_index = df['Age'].index[df['Age'].apply(np.isnan)]
df.loc[nan_index,"Age"] = df.loc[nan_index,"Age_mean"]
The first line is going to return the indices of the rows for which the value of Age
is NaN
. Once we know that we just to replace those with the value in the column Age_mean
which is done by the second statement.
OUTPUT
Age Age_mean
0 2.0 2
1 3.0 5
2 9.0 9
3 8.0 2
4 1.0 1
Upvotes: 2