william007
william007

Reputation: 18545

Counting null as percentage

Is there a fast way to automatically generate the null percentage for each columns, and output as a table?

e.g., if a column has 40 row, with 10 null values, it will be 10/40

I use the following code but now work (no values shown): enter image description here

Upvotes: 3

Views: 5439

Answers (2)

Zero
Zero

Reputation: 76927

You could use df.count()

In [56]: df
Out[56]:
     a    b
0  1.0  NaN
1  2.0  1.0
2  NaN  NaN
3  NaN  NaN
4  5.0  NaN

In [57]: 1 - df.count()/len(df.index)
Out[57]:
a    0.4
b    0.8
dtype: float64

Timings, count is decently faster than isnull.sum()

In [68]: df.shape
Out[68]: (50000, 2)

In [69]: %timeit 1 - df.count()/len(df.index)
1000 loops, best of 3: 542 µs per loop

In [70]: %timeit  df.isnull().sum()/df.shape[0]
100 loops, best of 3: 2.87 ms per loop

Upvotes: 6

EdChum
EdChum

Reputation: 394071

IIUC then you can use isnull with sum and then divide by the number of rows:

In [12]:
df = pd.DataFrame({'a':[1,2,np.NaN,np.NaN,5], 'b':[np.NaN,1,np.NaN,np.NaN,np.NaN]})
df

Out[12]:
     a    b
0  1.0  NaN
1  2.0  1.0
2  NaN  NaN
3  NaN  NaN
4  5.0  NaN

In [14]:    
df.isnull().sum()/df.shape[0]

Out[14]:
a    0.4
b    0.8
dtype: float64

Upvotes: 4

Related Questions