drscheme
drscheme

Reputation: 19

Opening an UTF-8 encoded file by a Python 3.6 script running in PyCharm 2016.3.2

I have a quite odd problem with PyCharm and a Python app that I am working on.

As I am have been googling for a solution for some time and no proposed idea helps I want to ask here.

I want to open an UTF-8 encoded file using the following code:

#!/usr/bin/env python3    

import os, platform

def read(file):
    f = open(file, "r")
    content = f.read()
    f.close()
    return content

print(platform.python_version())
print(os.environ["PYTHONIOENCODING"])

content = read("testfile")
print(content)

The code crashes when run in PyCharm. The output is

3.6.0
UTF-8
Traceback (most recent call last):
  File "/Users/xxx/Documents/Scripts/pycharmutf8/file.py", line 14, in <module>
    content = read("testfile")
  File "/Users/xxx/Documents/Scripts/pycharmutf8/file.py", line 7, in read
    content = f.read()
  File "/usr/local/Cellar/python3/3.6.0_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5: ordinal not in range(128)

When I run the identical code from command line, it works just fine:

./file.py 
3.6.0
utf-8:surrogateescape
I am a file with evil unicode characters: äöü

I have found out that in comparable situations people are advised to set the environment variable PYTHONIOENCODING to utf-8:surrogateescape that I did (as you can see in above output) system-wide

export PYTHONIOENCODING=utf-8:surrogateescape

but also in PyCharm itself (Settings -> Build -> Console -> Python Console -> Environment variables).

This does not have any effect. Do you have further suggestions?

Upvotes: 1

Views: 2758

Answers (2)

Rajas Agashe
Rajas Agashe

Reputation: 371

If it's harder to change the encoding for the open call i.e. it's happening in a library you can change this environment variable in the run configurations: LC_CTYPE=en_US.UTF-8

Source: PyCharm is changing the default encoding in my Django app

Upvotes: 3

Mark Tolonen
Mark Tolonen

Reputation: 177951

If you want to read a UTF8 file, specify the encoding:

def read(file):
    with open(file, encoding='utf8') as f:
        content = f.read()

Upvotes: 1

Related Questions