Reputation: 8823
I'm using python to read a CSV file with the following settings:
import unicodecsv, ssl
ctx = ssl._create_unverified_context()
response = urllib2.urlopen(url, timeout=300, context=ctx)
data = unicodecsv.reader(response,
delimiter=";",
quotechar="\"",
doublequote=False,
quoting=unicodecsv.QUOTE_ALL,
skipinitialspace=True,
encoding="utf-8-sig")
For this line:
"ID";"Product";"URL";"Color";"Stock"
it returns: "ID", Product, URL, Color, Stock
So for the first element in the line, it keeps the quote. I use utf-8-sig because there are BOM characters.
Upvotes: 0
Views: 223
Reputation: 198324
Confirmed as bug in unicodecsv for Python2 (see issue 81).
unicodecsv.UnicodeReader
is not passing along the encoding to the underlying csv.reader
, so it doesn't know the BOM should be stripped, so the first field doesn't start with the quotechar, and doesn't count as a quoted field.
The issue is at this moment 2.5 years old, the project was last touched 4+ years ago, and seems abandoned (per issue 92). I highly suggest moving away from unicodecsv
. If you have to use it, read the response yourself into a string, strip BOM, then pass the cleaned up text to unicodecsv
via io.StringIO
.
Upvotes: 1