WLC
WLC

Reputation: 127

Counting number of "NaN" (not zeros or blanks) in csv

Is it possible to have Python count the number of 'NaN' (as string/text) in a csv file? Tried using pandas' read_csv, but some columns which have blanks are read as NaN also. The only working method I know is to use excel find 'NaN' as values.

Anyone knows of other methods? Thanks in advance!

Upvotes: 4

Views: 7832

Answers (3)

jordan carrey
jordan carrey

Reputation: 11

df.isna().sum()

it will list the number of NaNs per column

Upvotes: 1

piRSquared
piRSquared

Reputation: 294318

Setup
Consider a csv file named tst.csv that looks like this:

h1,h2,h3
NaN,1,
2,3,NaN
5,6,9
NaN,1,
2,3,NaN
5,6,9

Solution
Use open and str.count

with open('tst.csv') as f:
    c = f.read().count('NaN')

print(c)

4

Upvotes: 1

Brad Solomon
Brad Solomon

Reputation: 40888

You can use pd.read_csv but you will need two parameters: na_values and keep_default_na.

  1. na_values:

Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘nan’`.

  1. keep_default_na:

If na_values are specified and keep_default_na is False the default NaN values are overridden, otherwise they’re appended to.

So in your case:

pd.read_csv('path/to/file.csv', na_values='NaN', keep_default_na=False)

If you want to be a bit more "liberal" then you might want something like na_values=['nan', 'NaN']--the point is these will be interpreted very strictly.

An example--say you have the following CSV file with 1 literal NaN and two blanks:

enter image description here

import pandas as pd
import numpy as np
df = pd.read_csv('input/sample.csv', na_values='NaN', keep_default_na=False)
print(np.count_nonzero(df.isnull().values))
# 1

Upvotes: 5

Related Questions