safetyduck
safetyduck

Reputation: 6834

python encoding error

What does one do with this kind of error? You are reading lines from a file. You don't know the encoding.

What does "byte 0xed" mean? What does "position 3792" mean?

I'll try to answer this myself and repost but I'm slightly annoyed that I'm spending as long as I am figuring this out. Is there a clobber/ignore and continue method for getting past unknown encodings? I just want to read a text file!

Traceback (most recent call last):
  File "./test.py", line 8, in <module>
    for x in fin:
  File "/bns/rma/local/lib/python3.1/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 3792: ordinal not in range(128)

Upvotes: 0

Views: 2210

Answers (2)

safetyduck
safetyduck

Reputation: 6834

I think I found the way to be dumb :) :

fin = (x.decode('ascii', 'ignore') for x in fin)

for x in fin: print(x)

where errors='ignore' could be 'replace' or whatever. This at least follows the idiom "garbage in, garbage out" that I am seeking.

Upvotes: 0

GaretJax
GaretJax

Reputation: 7780

0xed is the unicode code for í, which is contained in the input at the position 3792 (that is, if you count starting at the first letter, the 3792nd letter will be í).

You are using the ascii codec to decode the file, but the file is not ascii-encoded, try with a unicode aware codec instead (utf_8 maybe?), or, if you know the encoding used to write the file, choose the appropriate encoding from the full list of available codecs.

Upvotes: 3

Related Questions