Balkyto
Balkyto

Reputation: 1510

Pandas keep 'null' and ' ' when loading CSV

I have a weird CSV that has "null" as a value, as well it has an empty cell as a value.

So my row looks like this:

null,0,0,0,1,,,,0,0,0,null

I'm doing nothing but reading and rewriting a file:

f = pd.read_csv(input_file,sep=',', quotechar='"', engine='python', converters={i: str for i in range(0, 155)})
f.to_csv(output_file, sep=',', escapechar=' ', quotechar='"', quoting=csv.QUOTE_MINIMAL, index=False)

Above is technically "renaming" a file, but this is to prove that I'm getting the same values for the whole file, later I'll trim certain columns (which I did already, but figured out data is wrong).

So, how do I preserve both "null", and " " without reading each row/column, and making a function that checks every single cell?

Is that even possible?

I tried na_rep="null", but then all '' are converted to 'null', or if I drop them, then I loose my 'null' values.

Upvotes: 3

Views: 4598

Answers (1)

Chris Adams
Chris Adams

Reputation: 18647

To ignore the handling of NaN values altogether you can pass the na_filter=False argument to read_csv method:

f = (pd.read_csv(input_file,sep=',',
                 quotechar='"',
                 engine='python',
                 converters={i: str for i in range(0, 155)},
                 na_filter=False))

Alternatively, if you require a bit more control over how values are handled, you can exclude defaults and pass in your own list of na_values. For example:

f = (pd.read_csv(input_file,sep=',',
                 quotechar='"',
                 engine='python',
                 converters={i: str for i in range(0, 155)},
                 na_values=['nan', 'NaN'],
                 keep_default_na=False))

The default values that will be automatically coerced to NaN are

default_na_values = ['', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan',
                     '1.#IND', '1.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null']

Upvotes: 5

Related Questions