Franck Dernoncourt
Franck Dernoncourt

Reputation: 83187

Why does "Save as UTF-8" in Eclipse fix the Python UnicodeEncodeError?

I have:

When I run it in Eclipse 4.7.2 on Windows 7 SP1 x64 Ultimate and with Python 3.5.2 x64, I get the error message:

Traceback (most recent call last):
  File "C:\eclipse-4-7-2-workspace\SEtest\test.py", line 3, in <module>
    print('text: {0}'.format(text))
  File "C:\programming\python\Python35-32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2660' in position 6: character maps to <undefined>

My understanding is that the issue stems from the fact that on Microsoft Windows, by default the Python interpreter uses CP-1252 as its encoding and therefore has is with the character .

Also, I would note at that point that I kept Eclipse default encoding, which can be seen in Preferences > General > Workspace:

enter image description here

When I change the Python script test.py to:

import codecs
print(u'♠') # <--- adding this line is the only modification
text = codecs.open('file.txt', 'r', 'UTF-8').read()
print('text: {0}'.format(text))

then try to run it, I get the error message:

enter image description here

(note: Eclipse is configured to save the script whenever I run it).

After selecting the option Save as UTF-8, I get the same error message:

Traceback (most recent call last):
  File "C:\Users\Francky\eclipse-4-7-2-workspace\SEtest\test.py", line 2, in <module>
    print(u'\u2660')
  File "C:\programming\python\Python35-32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2660' in position 0: character maps to <undefined>

which I think is expected since the Python interpreter still uses CP-1252.

But if I run the script again in Eclipse without any modification, it works. The output is:

♠
text: ♠

Why does it work?

Upvotes: 1

Views: 523

Answers (1)

howlger
howlger

Reputation: 34165

Phyton converts the text to be printed to the encoding of the console which is the active code page on Windows (at least until version 3.6).

To avoid the UnicodeEncodeError you have to change the console encoding to UTF-8. There are several ways to do this, e. g. on the Windows command line by executing cmd /K chcp 65001.

In Eclipse, the encoding of the console can be set to UTF-8 in the run configuration (Run > Run Configurations...), in the Common tab.

The text file encoding settings in Window > Preferences: General > Workspace and in Project > Properties: Ressource are only used by text editors how to display text files.

Upvotes: 1

Related Questions