whateverrest
whateverrest

Reputation: 178

How to parse a double quoted csv files in Python?

Im trying to parse a csv that is quoted with "'. So basicaly the file looks like this:

"'test1'","'test2'","'test3'","'test4'"
"'value1'","'value2'",,"'value4'"

My attempt to parse it is the following:

import csv
from pprint import pprint

inputCsv = "test.csv"

with open(inputCsv, 'r', newline='') as csvfile:
    dictReader = csv.DictReader(csvfile, quotechar='"', delimiter=',',
                 quoting=csv.QUOTE_ALL, doublequote=True)
    for line in dictReader:
        pprint(line)
        # print(line["'test1'"]) # works, but only with "'test1'", not "test1" or 'test1'; also result is 'value1' not value1

I wanted the key to be test1 - so i can access it with line["test1"] (instead of line["'test1'"]) and the value to be value1, without the additional quotes.

Is this possible without going over the whole dict and removing the quotes for each element after parsing?

Online Example: https://repl.it/repls/WoefulDeafeningMacroinstruction

Upvotes: 1

Views: 1608

Answers (2)

bruno desthuilliers
bruno desthuilliers

Reputation: 77892

You can define your own reader to fix the issue during iteration (warning: untested code, but it should at least get you started):

class MyReader(csv.reader):
    def __next__(self):
        row = super().__next__()
        return [value.strip("'") for value in row]

class MyDictReader(csv.DictReader):
    def __init__(self, f, fieldnames=None, restkey=None, restval=None,
                 dialect="excel", *args, **kwds):
        super().__init__(f, fieldnames, restkey, restval, dialect, *args, *kwds)
        self.reader = MyReader(f, dialect, *args, **kwds)

Upvotes: 3

Amadan
Amadan

Reputation: 198314

This is a bit roundabout, but if we read the file twice as CSV, we get what we want:

import csv
from pprint import pprint
from io import StringIO

inputCsv = "test.csv"

with open(inputCsv, 'r', newline='') as csvfile:
    csvReader = csv.reader(csvfile, quotechar='"', delimiter=',')
    dequotedStringIO = StringIO()
    csvWriter = csv.writer(dequotedStringIO, quoting=csv.QUOTE_NONE)
    csvWriter.writerows(csvReader)
    dequotedLines = dequotedStringIO.getvalue().splitlines()
    dictReader = csv.DictReader(dequotedLines, quotechar="'")
    for line in dictReader:
        print(line['test1'])

So first we have a straight csv.reader that will parse outer quotes; then we send all data back to a straight csv.writer and tell it to never quote anything. In effect this strips the outer double quotes in a way that respects CSV semantics, and you're left with a compliant CSV file that only has single quotes, which you can pass into csv.DictReader for the desired end result.

Upvotes: 1

Related Questions