Reputation: 1510
I have a weird CSV that has "null" as a value, as well it has an empty cell as a value.
So my row looks like this:
null,0,0,0,1,,,,0,0,0,null
I'm doing nothing but reading and rewriting a file:
f = pd.read_csv(input_file,sep=',', quotechar='"', engine='python', converters={i: str for i in range(0, 155)})
f.to_csv(output_file, sep=',', escapechar=' ', quotechar='"', quoting=csv.QUOTE_MINIMAL, index=False)
Above is technically "renaming" a file, but this is to prove that I'm getting the same values for the whole file, later I'll trim certain columns (which I did already, but figured out data is wrong).
So, how do I preserve both "null", and " " without reading each row/column, and making a function that checks every single cell?
Is that even possible?
I tried na_rep="null", but then all '' are converted to 'null', or if I drop them, then I loose my 'null' values.
Upvotes: 3
Views: 4598
Reputation: 18647
To ignore the handling of NaN
values altogether you can pass the na_filter=False
argument to read_csv
method:
f = (pd.read_csv(input_file,sep=',',
quotechar='"',
engine='python',
converters={i: str for i in range(0, 155)},
na_filter=False))
Alternatively, if you require a bit more control over how values are handled, you can exclude defaults and pass in your own list of na_values
. For example:
f = (pd.read_csv(input_file,sep=',',
quotechar='"',
engine='python',
converters={i: str for i in range(0, 155)},
na_values=['nan', 'NaN'],
keep_default_na=False))
The default values that will be automatically coerced to NaN
are
default_na_values = ['', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan',
'1.#IND', '1.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null']
Upvotes: 5