Python using wrong encoding

Question

Using python 3.2, I am trying to decode bytes using str(bytes, "cp1251") but I get this error:

Traceback (most recent call last):
  File "C:\---\---\---\---.py", line 4, in 
    writetemp.write(str(f.read(), "cp1251"))
  File "C:\Python32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 19-25: character     maps to

As you can see, I specified "cp1251", but it attempts to use "cp1252.py" to decode instead of "cp1251.py", which causes the error, I think. Same thing occurs if I try "Windows-1251" instead of "cp1251".

Thomas Wouters · Accepted Answer

Note how what you're getting is a UnicodeEncodeError, not a UnicodeDecodeError. The error doesn't come from your str(f.read(), "cp1251") call. Instead, it comes from the writetemp.write() call.

The str() call decodes the bytes you get from f.read() using cp1251 as the encoding. That works. That gives you a string (which is unicode, in Python 3.) writetemp.write() then has to turn the string back into bytes, by encoding it. It does that using the encoding you passed when opening writetemp, or the default IO encoding (which Python tries to guess at based on various things.) You can see which encoding that is by looking at the encoding attribute of the file object. You'll probably find it is cp1252. If you want to write in a particular encoding, don't rely on Python guessing at it; explicitly specify the encoding when you open the file.

Python using wrong encoding

Answers (1)

Related Questions