Reputation: 21
My dataset has a column called age
and I'm trying to count the null values.
I know it can be easily achieved by doing something like len(df) - df['age'].count()
. However, I'm playing around with functions and just like to apply the function to calculate the null count.
Here is what I have:
def age_is_null(df):
age_col = df['age']
null = df[age_col].isnull()
age_null = df[null]
return len(age_null)
count = df.apply(age_is_null)
print (count)
When I do that, I received an error: KeyError: 'age'
.
Can someone tells me why I'm getting that error and what should I change in the code to make it work?
Upvotes: 1
Views: 1345
Reputation: 863741
You need DataFrame.pipe
or pass DataFrame to function here:
#function should be simplify
def age_is_null(df):
return df['age'].isnull().sum()
count = df.pipe(age_is_null)
print (count)
count = age_is_null(df)
print (count)
Error means if use DataFrame.apply
then iterate by columns, so it failed if want select column age
.
def func(x):
print (x)
df.apply(func)
EDIT: For selecting column use column name:
def age_is_null(df):
age_col = 'age' <- here
null = df[age_col].isnull()
age_null = df[null]
return len(age_null)
Or pass selected column for mask:
def age_is_null(df):
age_col = df['age']
null = age_col.isnull() <- here
age_null = df[null]
return len(age_null)
Upvotes: 2
Reputation: 157
You need to pass dataframe df while calling the function age_is_null.That's why age column is not recognised.
count = df.apply(age_is_null(df))
Upvotes: 0
Reputation: 1691
Instead of making a function, you can Try this
df[df["age"].isnull() == True].shape
Upvotes: 0