user6416338
user6416338

Reputation:

Python Dataframe get null value counts

I am trying to find the null values in a DataFrame. Though I reviewed the following post from Stackoverflow that describes the process to determine the null values, I am having a hard time to do the same for my dataset.

How to count the Nan values in the column in Panda Data frame

Working code:

import pandas as pd
a = ['america','britain','brazil','','china','jamaica'] #I deliberately introduce a NULL value
a = pd.DataFrame(a)
a.isnull()

#Output: 
False
1  False
2  False
3  False
4  False
5  False

a.isnull().sum()
#Output
#0    0
#dtype: int64

What am I doing wrong?

Upvotes: 4

Views: 5722

Answers (3)

Craig
Craig

Reputation: 4855

The '' in your list isn't a null value, it's an empty string. To get a null, use None instead. This is described in the pandas.isnull() documentation that missing values are "NaN in numeric arrays, [or] None/NaN in object arrays".

import pandas as pd
a = ['america','britain','brazil',None,'china','jamaica']
a = pd.DataFrame(a)
a.isnull()

       0
0  False
1  False
2  False
3   True
4  False
5  False

You can see the difference by printing the two dataframes. In the first case, the dataframe looks like:

pd.DataFrame(['america','britain','brazil',None,'china','jamaica'])

         0
0  america
1  britain
2   brazil
3         
4    china
5  jamaica

Notice that the value at index 3 is an empty string.

In the second case, you get:

pd.DataFrame(['america','britain','brazil',None,'china','jamaica'])

         0
0  america
1  britain
2   brazil
3     None
4    china
5  jamaica

Upvotes: 1

piRSquared
piRSquared

Reputation: 294278

The other posts addressed that '' is not a null value and therefore isn't counted as such with the isnull method...

...However, '' does evaluate to False when interpreted as a bool.

a.astype(bool)

       0
0   True
1   True
2   True
3  False
4   True
5   True

This might be useful if you have '' in your dataframe and want to process it this way.

Upvotes: 0

Abdou
Abdou

Reputation: 13274

If you want '', None and NaN to all count as null, you can use the applymap method on each value in the dataframe coerced to a boolean and then use .sum subsequently:

import pandas as pd
import numpy as np


a = ['america','britain','brazil',None,'', np.nan, 'china','jamaica'] #I deliberately introduce a NULL value
a = pd.DataFrame(a)
a.applymap(lambda x: not x or pd.isnull(x)).sum()

# 0    3
# dtype: int64

I hope this helps.

Upvotes: 1

Related Questions