Reputation: 865
I have a dataframe which has multiple columns. I'd like to iterate through the columns, counting for each column how many null values there are and produce a new dataframe which displays the sum of isnull
values alongside the column header names.
If I do:
for col in main_df:
print(sum(pd.isnull(data[col])))
I get a list of the null count for each column:
0
1
100
What I'm trying to do is create a new dataframe which has the column header alongside the null count, e.g.
col1 | 0
col2 | 1
col3 | 100
Upvotes: 4
Views: 68498
Reputation: 21
for a certain dataset you could use to show you the missing values and the number
dataset.isnull().sum(axis = 0)
Upvotes: 2
Reputation: 451
If the number of columns in your dataframe is greater than 10 you will end up with the middle columns being left out of the output. You can print every column using:
nulls = df.isnull().sum().to_frame()
for index, row in nulls.iterrows():
print(index, row[0])
Upvotes: 2
Reputation: 210932
Try this:
In [71]: df
Out[71]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
4 1.0 3.0 9
5 7.0 4.0 9
6 2.0 6.0 9
7 9.0 6.0 4
8 3.0 0.0 9
9 9.0 0.0 1
In [72]: pd.isnull(df).sum()
Out[72]:
a 1
b 2
c 0
dtype: int64
or:
In [76]: df.isnull().sum()
Out[76]:
a 1
b 2
c 0
dtype: int64
you can create a DF out of it:
In [78]: df.isnull().sum().to_frame('nulls')
Out[78]:
nulls
a 1
b 2
c 0
Upvotes: 17