Bibin Hashley
Bibin Hashley

Reputation: 31

How calling a column in a function is different from calling a column outside in pandas dataframe?

df=pd.DataFrame({"Age":[11,12,14,15,np.NaN,np.NaN],"Class":[10,11,10,11,9,9]})
df

def impute_age(cols):
    Age = cols[0]
    Class = cols[1]

    if np.isnan(Age):

        if Class == 10:
            return 11

        elif Class == 11:
            return 12

        else:
            return 9

    else:
        return Age

df.apply(impute_age,axis=1)

Here in this impute_age function, it called first row as cols[0],but if we call a column of a dataframe outside the funcion by using cols[0], it will show error because we have to specify the column name. Why?

Upvotes: 1

Views: 100

Answers (2)

Invibsid
Invibsid

Reputation: 36

From the DataFrame.apply documentation:

Objects passed to the function are Series objects [...]

This means that what's passed to impute_age is a Series, not the complete DataFrame. i.e. the function is not applied to df but to df.loc[i] (for each possible i): If you print df.loc[0][0] you'll get the Age value of the first row.

Upvotes: 1

jezrael
jezrael

Reputation: 862641

If check print(cols) it return each row of DataFrame like Series, so if want select by position use iat:

def impute_age(cols):
    print (cols)
    Age = cols.iat[0]
    Class = cols.iat[1]
    if np.isnan(Age):

        if Class == 10:
            return 11

        elif Class == 11:
            return 12

        else:
            return 9

    else:
        return Age

Or select by column name:

def impute_age(cols):
    print (cols)
    Age = cols['Age']
    Class = cols['Class']
    if np.isnan(Age):

        if Class == 10:
            return 11

        elif Class == 11:
            return 12

        else:
            return 9

    else:
        return Age

df = df.apply(impute_age,axis=1)

Upvotes: 1

Related Questions