godzilla
godzilla

Reputation: 3125

python csv reader not handling quotes

I have a file i wish to parse using a CSV reader, it has 12 rows but some of the columns contain quotes and to make things more complicated also commas and single quotes and new lines, the trouble is the csv reader does not handle the quotes correctly, the quotes within quotes are treated as a separate entity, here is a small sample of what I am dealing with.

ptr = open("myfile")
text = ptr.read()
ptr.close() 

for l in  csv.reader(text, quotechar='"', delimiter=',',quoting=csv.QUOTE_ALL, skipinitialspace=True):
    print l

the file contains:

"0","11/21/2013","NEWYORK","USA
 Atlantic ","the person replied \"this quote\" to which i was shocked,
this came as an utter surprise"

"1","10/18/2013","London","UK","please note the message \"next quote\" 
is invalid"

"2","08/11/2014","Paris","France",
"the region is in a very important geo strategic importance"

Upvotes: 2

Views: 3278

Answers (2)

Avinash Raj
Avinash Raj

Reputation: 174706

Through re module.

import re
import csv
with open('file') as f:
    m = re.split(r'\n\n+', f.read())
    for line in m:
        print(re.findall(r'(?<!\\)"(?:\\"|[^"])*(?<!\\)"', line))

Output:

['"0"', '"11/21/2013"', '"NEWYORK"', '"USA\n Atlantic "', '"the person replied \\"this quote\\" to which i was shocked,\nthis came as an utter surprise"']
['"1"', '"10/18/2013"', '"London"', '"UK"', '"please note the message \\"next quote\\" \nis invalid"']
['"2"', '"08/11/2014"', '"Paris"', '"France"', '"the region is in a very important geo strategic importance"']

Upvotes: 1

freakish
freakish

Reputation: 56467

You have to set escapechar in your reader:

csv.reader(..., escapechar='\\')

which by default is None (don't know why).

The second thing is that you initialize the reader incorrectly. You don't pass a string to reader, but a stream:

with open("myfile") as fo:
    reader = csv.reader(
        fo,
        quotechar='"',
        delimiter=',',
        quoting=csv.QUOTE_ALL,
        skipinitialspace=True,
        escapechar='\\'
    )

    for row in reader:
        print row

Upvotes: 10

Related Questions