Reputation: 31
df=pd.DataFrame({"Age":[11,12,14,15,np.NaN,np.NaN],"Class":[10,11,10,11,9,9]})
df
def impute_age(cols):
Age = cols[0]
Class = cols[1]
if np.isnan(Age):
if Class == 10:
return 11
elif Class == 11:
return 12
else:
return 9
else:
return Age
df.apply(impute_age,axis=1)
Here in this impute_age function, it called first row as cols[0],but if we call a column of a dataframe outside the funcion by using cols[0], it will show error because we have to specify the column name. Why?
Upvotes: 1
Views: 100
Reputation: 36
From the DataFrame.apply documentation:
Objects passed to the function are Series objects [...]
This means that what's passed to impute_age is a Series, not the complete DataFrame. i.e. the function is not applied to df
but to df.loc[i]
(for each possible i): If you print df.loc[0][0]
you'll get the Age
value of the first row.
Upvotes: 1
Reputation: 862641
If check print(cols)
it return each row of DataFrame like Series, so if want select by position use iat
:
def impute_age(cols):
print (cols)
Age = cols.iat[0]
Class = cols.iat[1]
if np.isnan(Age):
if Class == 10:
return 11
elif Class == 11:
return 12
else:
return 9
else:
return Age
Or select by column name:
def impute_age(cols):
print (cols)
Age = cols['Age']
Class = cols['Class']
if np.isnan(Age):
if Class == 10:
return 11
elif Class == 11:
return 12
else:
return 9
else:
return Age
df = df.apply(impute_age,axis=1)
Upvotes: 1