Alexandru R
Alexandru R

Reputation: 8823

CSV reader incorrectly returns first element of line

I'm using python to read a CSV file with the following settings:

import unicodecsv, ssl    
ctx = ssl._create_unverified_context()
    response = urllib2.urlopen(url, timeout=300, context=ctx)
    data = unicodecsv.reader(response,
                              delimiter=";",
        quotechar="\"",
        doublequote=False,
        quoting=unicodecsv.QUOTE_ALL,
        skipinitialspace=True,
        encoding="utf-8-sig")

For this line:

"ID";"Product";"URL";"Color";"Stock"

it returns: "ID", Product, URL, Color, Stock

So for the first element in the line, it keeps the quote. I use utf-8-sig because there are BOM characters.

Upvotes: 0

Views: 223

Answers (1)

Amadan
Amadan

Reputation: 198324

Confirmed as bug in unicodecsv for Python2 (see issue 81).

unicodecsv.UnicodeReader is not passing along the encoding to the underlying csv.reader, so it doesn't know the BOM should be stripped, so the first field doesn't start with the quotechar, and doesn't count as a quoted field.

The issue is at this moment 2.5 years old, the project was last touched 4+ years ago, and seems abandoned (per issue 92). I highly suggest moving away from unicodecsv. If you have to use it, read the response yourself into a string, strip BOM, then pass the cleaned up text to unicodecsv via io.StringIO.

Upvotes: 1

Related Questions