Reputation:
I am trying to find the null values in a DataFrame. Though I reviewed the following post from Stackoverflow that describes the process to determine the null values, I am having a hard time to do the same for my dataset.
How to count the Nan values in the column in Panda Data frame
Working code:
import pandas as pd
a = ['america','britain','brazil','','china','jamaica'] #I deliberately introduce a NULL value
a = pd.DataFrame(a)
a.isnull()
#Output:
False
1 False
2 False
3 False
4 False
5 False
a.isnull().sum()
#Output
#0 0
#dtype: int64
What am I doing wrong?
Upvotes: 4
Views: 5722
Reputation: 4855
The ''
in your list isn't a null value, it's an empty string. To get a null, use None
instead. This is described in the pandas.isnull()
documentation that missing values are "NaN in numeric arrays, [or] None/NaN in object arrays".
import pandas as pd
a = ['america','britain','brazil',None,'china','jamaica']
a = pd.DataFrame(a)
a.isnull()
0
0 False
1 False
2 False
3 True
4 False
5 False
You can see the difference by printing the two dataframes. In the first case, the dataframe looks like:
pd.DataFrame(['america','britain','brazil',None,'china','jamaica'])
0
0 america
1 britain
2 brazil
3
4 china
5 jamaica
Notice that the value at index 3 is an empty string.
In the second case, you get:
pd.DataFrame(['america','britain','brazil',None,'china','jamaica'])
0
0 america
1 britain
2 brazil
3 None
4 china
5 jamaica
Upvotes: 1
Reputation: 294278
The other posts addressed that ''
is not a null value and therefore isn't counted as such with the isnull
method...
...However, ''
does evaluate to False
when interpreted as a bool
.
a.astype(bool)
0
0 True
1 True
2 True
3 False
4 True
5 True
This might be useful if you have ''
in your dataframe and want to process it this way.
Upvotes: 0
Reputation: 13274
If you want ''
, None
and NaN
to all count as null
, you can use the applymap
method on each value in the dataframe coerced to a boolean
and then use .sum
subsequently:
import pandas as pd
import numpy as np
a = ['america','britain','brazil',None,'', np.nan, 'china','jamaica'] #I deliberately introduce a NULL value
a = pd.DataFrame(a)
a.applymap(lambda x: not x or pd.isnull(x)).sum()
# 0 3
# dtype: int64
I hope this helps.
Upvotes: 1