Reputation: 67
I tried this, but I'm not sure if this is the best way to get the information about columns with missing values. For example, I use the target labels to reduce information over missing values and see much better its distribution
cols = dataframe.columns.values.tolist()
dfnas = pd.DataFrame()
for col in cols:
dfnas[col] = dataframe.label[dataframe[col].isnull()].value_counts()
[Edited]
This is the result of that snippet
In [6]:
dfnas Out[6]:
Out[64]:
id f1 f2 f3 f4 f5 f6
0 NaN NaN NaN 180 100 NaN NaN
1 NaN NaN NaN 1 1 NaN NaN
Upvotes: 1
Views: 534
Reputation: 36555
You could use np.sum
to get the counts for each column:
import numpy as np
import pandas as pd
df = pd.DataFrame({'c1':[1, np.nan, np.nan], 'c2':[2, 2, np.nan]})
np.sum(df.isnull())
Out[4]:
c1 2
c2 1
dtype: int64
Upvotes: 1