Reputation: 83187
I have:
file.txt
containing just one character: ♠
, and UTF-8 encoded.a CP-1252 encoded Python script test.py
containing:
import codecs
text = codecs.open('file.txt', 'r', 'UTF-8').read()
print('text: {0}'.format(text))
When I run it in Eclipse 4.7.2 on Windows 7 SP1 x64 Ultimate and with Python 3.5.2 x64, I get the error message:
Traceback (most recent call last):
File "C:\eclipse-4-7-2-workspace\SEtest\test.py", line 3, in <module>
print('text: {0}'.format(text))
File "C:\programming\python\Python35-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2660' in position 6: character maps to <undefined>
My understanding is that the issue stems from the fact that on Microsoft Windows, by default the Python interpreter uses CP-1252 as its encoding and therefore has is with the character ♠
.
Also, I would note at that point that I kept Eclipse default encoding, which can be seen in Preferences > General > Workspace
:
When I change the Python script test.py
to:
import codecs
print(u'♠') # <--- adding this line is the only modification
text = codecs.open('file.txt', 'r', 'UTF-8').read()
print('text: {0}'.format(text))
then try to run it, I get the error message:
(note: Eclipse is configured to save the script whenever I run it).
After selecting the option Save as UTF-8
, I get the same error message:
Traceback (most recent call last):
File "C:\Users\Francky\eclipse-4-7-2-workspace\SEtest\test.py", line 2, in <module>
print(u'\u2660')
File "C:\programming\python\Python35-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2660' in position 0: character maps to <undefined>
which I think is expected since the Python interpreter still uses CP-1252.
But if I run the script again in Eclipse without any modification, it works. The output is:
♠
text: ♠
Why does it work?
Upvotes: 1
Views: 523
Reputation: 34165
Phyton converts the text to be printed to the encoding of the console which is the active code page on Windows (at least until version 3.6).
To avoid the UnicodeEncodeError
you have to change the console encoding to UTF-8. There are several ways to do this, e. g. on the Windows command line by executing cmd /K chcp 65001
.
In Eclipse, the encoding of the console can be set to UTF-8
in the run configuration (Run > Run Configurations...), in the Common tab.
The text file encoding settings in Window > Preferences: General > Workspace and in Project > Properties: Ressource are only used by text editors how to display text files.
Upvotes: 1