TheGreenFrog
TheGreenFrog

Reputation: 155

Printing Unicode Character in Python 3.6.5

I am using PyDev in Eclipse on Windows to write code in Python 3.6.5. I get an error when running this single line of code:

print("•")

This is the error I get:

SyntaxError: Non-UTF-8 code starting with '\x95' in file C:\Users\short\workspace\Python Test 4\src\foo.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

I thought Python 3.6 was supposed to make UTF-8 the default encoding. What am I doing wrong?

Upvotes: 1

Views: 750

Answers (2)

abarnert
abarnert

Reputation: 365627

The problem isn't with Python, but with your text editor. Python is defaulting to reading your file as UTF-8, but since your file isn't UTF-8, this fails.

If you edit a file as cp1252 or similar legacy Windows code pages, encodes to \x95. That isn't valid UTF-8, because UTF-8 characters are always either under 0x80, or a start byte >= 0xC0. Hence the error. (The UTF-8 for is \xe2\x80\xa2.)


If you've somehow configured Eclipse to edit source code using your system's default encoding instead of UTF-8, fix that. This question shows how to change the cross-language default settings for various versions of Eclipse, but the short version is that it's either: Preferences | General | Workspace | Text File Encoding or Preferences | General | Editors | Text editors. There are also per-language overrides somewhere under Editor, and you can also set a per-project override.


Another possibility: Eclipse will, by default, auto-detect the encoding for existing files and preserve it, instead of using its own preferred encoding. Since you're on Windows, it's quite possible that you originally created the file with Notepad (or some other Windows editor that isn't designed for programming), which defaults to your system's "OEM code page".

If so, don't do that. Never touch source code with Notepad. While you can force Notepad to export files as UTF-8, it's a pain (and then you'll just get problems with \xef\xbb\xbf UTF-8-SIG prefixes in all your files). If you don't want to use Eclipse itself for editing some reason, almost any other free text editor will do.

If this is the problem, to fix it, you just need to manually save-as the file to UTF-8 once, and then from now on it'll auto-detect as UTF-8 and work properly.


Alternatively, you could leave the file in cp1252 or whatever, and use a PEP-263 coding declaration, as mentioned in the error message, to override the UTF-8 default. But you'll be a lot happier going to UTF-8.

Upvotes: 2

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798526

Your source file isn't UTF-8.

>>> '•'.encode('cp1251')
b'\x95'

Read the instructions in the article linked to in the error message and declare the proper charset.

Upvotes: 1

Related Questions