Reputation: 31
In Python 3 I'm using the csv package
import csv
to read sparse comma-separated CSV files. Among the elements are some that are empty strings
...,"",...
and some that are null
...,,...
I'd like to distinguish between those. How can one do that?
Upvotes: 3
Views: 2140
Reputation: 351
If you substitute a safe placeholder (one that won't match an existing string in the csv file) for a bare pair of double quotes ("") before parsing, you can then substitute back once parsed and identify strings that have a single occurrence of the placeholder:
import csv
lines = [
'"row1_col1","row1_col2 with ""embedded"" quotes",3,"",,"row1_col6"',
'"row2_col1","row2_col2",3,"",,"row2_col6"',
]
lines = [line.replace('""', '__PAIR_OF_DOUBLE_QUOTES__') for line in lines]
csv_reader = csv.reader(lines, delimiter=',')
rows = []
for row in csv_reader:
for col in range(len(row)):
# empty string is null/none
if row[col] == '':
row[col] = None
# string with just a pair of double quotes is the empty string
elif row[col] == '__PAIR_OF_DOUBLE_QUOTES__':
row[col] = ''
else:
row[col] = row[col].replace('__PAIR_OF_DOUBLE_QUOTES__', '"')
rows.append(row)
This results in the following output:
>>> print(json.dumps(rows))
[
["row1_col1", "row1_col2 with \"embedded\" quotes", "3", "", null, "row1_col6"],
["row2_col1", "row2_col2", "3", "", null, "row2_col6"]
]
Upvotes: 1
Reputation: 31
I gather that this simply isn't possible with the csv module, so I'll continue to pre-process CSV files to insert a dummy value for "" (I sure love me some Emacs), then do my deeds with Python, detecting the dummy value and handling it as the empty string and empty values as null. Problem solved, by renting a place to live in Kludge City.
Upvotes: 0
Reputation: 425
This answer suggests it is not possible using the csv module. However, pandas read_csv() has a number of arguments that will allow you to distinguish between NULL and "".
Upvotes: 0