Using For-Loop in a Panda DataFrame (Python)

Question

I have a dataframe (df) in Python with a few features but I'm going to work with Age and Age_Mean columns.

In Age column, there are several null values. I would like to replace those null values with the same index from Age_Mean column.

Here is the code I used:

    for i in df:
        if df['Age'].isnull().iloc[i] == True:
            df['Age'].iloc[i] == df['Age_Mean'].iloc[i]

This is my error message:

KeyError: 'the label [Age] is not in the [index]'

Please let me know what is wrong with this code.

Parthasarathy Subburaj · Accepted Answer

The statement for i in df will iterate through the column name. Let's take an example to understand this better:

df = pd.DataFrame({"Age":np.array([2,3,np.nan,8,np.nan]),"Age_mean":np.array([2,5,9,2,1])})
df

so the data frame will look like this:

    Age Age_mean
0   2.0 2
1   3.0 5
2   NaN 9
3   8.0 2
4   NaN 1

Now lets see what the for loop will iterate over:

for i in df:
     print(i)

OUTPUT

Age
Age_mean

And now when you try to execute df['Age'].isnull().iloc[i] it is going to throw an error because the value of i will be Age in this case.

PROPOSED SOLUTION:

We can do this without a for loop as shown below:

nan_index = df['Age'].index[df['Age'].apply(np.isnan)]
df.loc[nan_index,"Age"] = df.loc[nan_index,"Age_mean"]

The first line is going to return the indices of the rows for which the value of Age is NaN. Once we know that we just to replace those with the value in the column Age_mean which is done by the second statement.

OUTPUT

    Age Age_mean
0   2.0 2
1   3.0 5
2   9.0 9
3   8.0 2
4   1.0 1

Using For-Loop in a Panda DataFrame (Python)

Answers (1)

Related Questions