Reputation: 18545
Is there a fast way to automatically generate the null percentage for each columns, and output as a table?
e.g., if a column has 40 row, with 10 null values, it will be 10/40
I use the following code but now work (no values shown):
Upvotes: 3
Views: 5439
Reputation: 76927
You could use df.count()
In [56]: df
Out[56]:
a b
0 1.0 NaN
1 2.0 1.0
2 NaN NaN
3 NaN NaN
4 5.0 NaN
In [57]: 1 - df.count()/len(df.index)
Out[57]:
a 0.4
b 0.8
dtype: float64
Timings, count
is decently faster than isnull.sum()
In [68]: df.shape
Out[68]: (50000, 2)
In [69]: %timeit 1 - df.count()/len(df.index)
1000 loops, best of 3: 542 µs per loop
In [70]: %timeit df.isnull().sum()/df.shape[0]
100 loops, best of 3: 2.87 ms per loop
Upvotes: 6
Reputation: 394071
IIUC then you can use isnull
with sum
and then divide by the number of rows:
In [12]:
df = pd.DataFrame({'a':[1,2,np.NaN,np.NaN,5], 'b':[np.NaN,1,np.NaN,np.NaN,np.NaN]})
df
Out[12]:
a b
0 1.0 NaN
1 2.0 1.0
2 NaN NaN
3 NaN NaN
4 5.0 NaN
In [14]:
df.isnull().sum()/df.shape[0]
Out[14]:
a 0.4
b 0.8
dtype: float64
Upvotes: 4