Reputation: 178
Im trying to parse a csv that is quoted with "'
.
So basicaly the file looks like this:
"'test1'","'test2'","'test3'","'test4'"
"'value1'","'value2'",,"'value4'"
My attempt to parse it is the following:
import csv
from pprint import pprint
inputCsv = "test.csv"
with open(inputCsv, 'r', newline='') as csvfile:
dictReader = csv.DictReader(csvfile, quotechar='"', delimiter=',',
quoting=csv.QUOTE_ALL, doublequote=True)
for line in dictReader:
pprint(line)
# print(line["'test1'"]) # works, but only with "'test1'", not "test1" or 'test1'; also result is 'value1' not value1
I wanted the key to be test1
- so i can access it with line["test1"]
(instead of line["'test1'"]
) and the value to be value1
, without the additional quotes.
Is this possible without going over the whole dict and removing the quotes for each element after parsing?
Online Example: https://repl.it/repls/WoefulDeafeningMacroinstruction
Upvotes: 1
Views: 1608
Reputation: 77892
You can define your own reader to fix the issue during iteration (warning: untested code, but it should at least get you started):
class MyReader(csv.reader):
def __next__(self):
row = super().__next__()
return [value.strip("'") for value in row]
class MyDictReader(csv.DictReader):
def __init__(self, f, fieldnames=None, restkey=None, restval=None,
dialect="excel", *args, **kwds):
super().__init__(f, fieldnames, restkey, restval, dialect, *args, *kwds)
self.reader = MyReader(f, dialect, *args, **kwds)
Upvotes: 3
Reputation: 198314
This is a bit roundabout, but if we read the file twice as CSV, we get what we want:
import csv
from pprint import pprint
from io import StringIO
inputCsv = "test.csv"
with open(inputCsv, 'r', newline='') as csvfile:
csvReader = csv.reader(csvfile, quotechar='"', delimiter=',')
dequotedStringIO = StringIO()
csvWriter = csv.writer(dequotedStringIO, quoting=csv.QUOTE_NONE)
csvWriter.writerows(csvReader)
dequotedLines = dequotedStringIO.getvalue().splitlines()
dictReader = csv.DictReader(dequotedLines, quotechar="'")
for line in dictReader:
print(line['test1'])
So first we have a straight csv.reader
that will parse outer quotes; then we send all data back to a straight csv.writer
and tell it to never quote anything. In effect this strips the outer double quotes in a way that respects CSV semantics, and you're left with a compliant CSV file that only has single quotes, which you can pass into csv.DictReader
for the desired end result.
Upvotes: 1