djc
djc

Reputation: 31

How to distinguish between the empty string "" and null input?

In Python 3 I'm using the csv package

import csv

to read sparse comma-separated CSV files. Among the elements are some that are empty strings

...,"",...

and some that are null

...,,...

I'd like to distinguish between those. How can one do that?

Upvotes: 3

Views: 2140

Answers (3)

ZachL
ZachL

Reputation: 351

If you substitute a safe placeholder (one that won't match an existing string in the csv file) for a bare pair of double quotes ("") before parsing, you can then substitute back once parsed and identify strings that have a single occurrence of the placeholder:

import csv
lines = [
    '"row1_col1","row1_col2 with ""embedded"" quotes",3,"",,"row1_col6"',
    '"row2_col1","row2_col2",3,"",,"row2_col6"',
]
lines = [line.replace('""', '__PAIR_OF_DOUBLE_QUOTES__') for line in lines]
csv_reader = csv.reader(lines, delimiter=',')
rows = []
for row in csv_reader:
    for col in range(len(row)):
        # empty string is null/none
        if row[col] == '':
            row[col] = None
        # string with just a pair of double quotes is the empty string
        elif row[col] == '__PAIR_OF_DOUBLE_QUOTES__':
            row[col] = ''
        else:
            row[col] = row[col].replace('__PAIR_OF_DOUBLE_QUOTES__', '"')
    rows.append(row)

This results in the following output:

>>> print(json.dumps(rows))
[
    ["row1_col1", "row1_col2 with \"embedded\" quotes", "3", "", null, "row1_col6"],
    ["row2_col1", "row2_col2", "3", "", null, "row2_col6"]
]

Upvotes: 1

djc
djc

Reputation: 31

I gather that this simply isn't possible with the csv module, so I'll continue to pre-process CSV files to insert a dummy value for "" (I sure love me some Emacs), then do my deeds with Python, detecting the dummy value and handling it as the empty string and empty values as null. Problem solved, by renting a place to live in Kludge City.

Upvotes: 0

Jan Morawiec
Jan Morawiec

Reputation: 425

This answer suggests it is not possible using the csv module. However, pandas read_csv() has a number of arguments that will allow you to distinguish between NULL and "".

Upvotes: 0

Related Questions