Jack S.
Jack S.

Reputation: 2793

Python using wrong encoding

Using python 3.2, I am trying to decode bytes using str(bytes, "cp1251") but I get this error:

Traceback (most recent call last):
  File "C:\---\---\---\---.py", line 4, in <module>
    writetemp.write(str(f.read(), "cp1251"))
  File "C:\Python32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 19-25: character     maps to <undefined>

As you can see, I specified "cp1251", but it attempts to use "cp1252.py" to decode instead of "cp1251.py", which causes the error, I think. Same thing occurs if I try "Windows-1251" instead of "cp1251".

Upvotes: 1

Views: 1246

Answers (1)

Thomas Wouters
Thomas Wouters

Reputation: 133395

Note how what you're getting is a UnicodeEncodeError, not a UnicodeDecodeError. The error doesn't come from your str(f.read(), "cp1251") call. Instead, it comes from the writetemp.write() call.

The str() call decodes the bytes you get from f.read() using cp1251 as the encoding. That works. That gives you a string (which is unicode, in Python 3.) writetemp.write() then has to turn the string back into bytes, by encoding it. It does that using the encoding you passed when opening writetemp, or the default IO encoding (which Python tries to guess at based on various things.) You can see which encoding that is by looking at the encoding attribute of the file object. You'll probably find it is cp1252. If you want to write in a particular encoding, don't rely on Python guessing at it; explicitly specify the encoding when you open the file.

Upvotes: 5

Related Questions