kyrenia
kyrenia

Reputation: 5575

Recode bytes which cannot be decoded in utf-8 in python

reading in from txt files - there is one byte which is causing me issues to encode:

    with open(input_filename_and_director, 'rb') as f:
        r = unicodecsv.reader(f, delimiter="|") 

Results in an error message:

   UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 26: invalid continuation byte

Is there anyway to specify how I want these bytes handled (i.e. to read this byte in as another character?)

Upvotes: 2

Views: 202

Answers (1)

cge
cge

Reputation: 9888

Depending upon what you want, try using unicodecsv.reader(f, delimiter="|", errors='replace') or unicodecsv.reader(f, delimiter="|", errors='ignore'). unicodecsv passes through the errors parameter to the unicode encoding. See the help for unicode or here for more information.

Upvotes: 1

Related Questions