unicodecsv.DictReader not working with io.StringIO (Python 2.7)

Question

I was trying to use csv.DictReader to parse UTF-8 data with special characters but I was getting the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in position 2: ordinal not in range(128)

I read online and found out that Python 2.7's csv library doesn't handle Unicode. I looked for an alternative library and found unicodecsv.

I replaced csv with unicodecsv but I get the same error. Here's a simplified version of my code:

from io import StringIO
from unicodecsv import DictReader, Dialect, QUOTE_MINIMAL

data = (
    'first_name,last_name,email
'
    'Elmer,Fudd,elmer@looneytunes.com
'
    'Jo\xc3\xa3o Ant\xc3\xb4nio,Ara\xc3\xbajo,joaoantonio@araujo.com
'
)

unicode_data = StringIO(unicode(data, 'utf-8-sig'), newline=None)

class CustomDialect(Dialect):
    delimiter = ','
    doublequote = True
    escapechar = '\'
    lineterminator = '
'
    quotechar = '"'
    quoting = QUOTE_MINIMAL
    skipinitialspace = True

rows = DictReader(unicode_data, dialect=CustomDialect)

for row in rows:
    print row

If I replace StringIO with BytesIO, the encoding works but I can't send the newlines argument anymore and then I get:

Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

Does anybody have any idea how I could solve this? Shouldn't unicodecsv be handling StringIO? Thanks

unicodecsv.DictReader not working with io.StringIO (Python 2.7)

Answers (1)

Related Questions