Reputation: 19
I have a quite odd problem with PyCharm and a Python app that I am working on.
As I am have been googling for a solution for some time and no proposed idea helps I want to ask here.
I want to open an UTF-8 encoded file using the following code:
#!/usr/bin/env python3
import os, platform
def read(file):
f = open(file, "r")
content = f.read()
f.close()
return content
print(platform.python_version())
print(os.environ["PYTHONIOENCODING"])
content = read("testfile")
print(content)
The code crashes when run in PyCharm. The output is
3.6.0
UTF-8
Traceback (most recent call last):
File "/Users/xxx/Documents/Scripts/pycharmutf8/file.py", line 14, in <module>
content = read("testfile")
File "/Users/xxx/Documents/Scripts/pycharmutf8/file.py", line 7, in read
content = f.read()
File "/usr/local/Cellar/python3/3.6.0_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5: ordinal not in range(128)
When I run the identical code from command line, it works just fine:
./file.py
3.6.0
utf-8:surrogateescape
I am a file with evil unicode characters: äöü
I have found out that in comparable situations people are advised to set the environment variable PYTHONIOENCODING
to utf-8:surrogateescape
that I did (as you can see in above output) system-wide
export PYTHONIOENCODING=utf-8:surrogateescape
but also in PyCharm itself (Settings -> Build -> Console -> Python Console -> Environment variables).
This does not have any effect. Do you have further suggestions?
Upvotes: 1
Views: 2758
Reputation: 371
If it's harder to change the encoding for the open call i.e. it's happening in a library you can change this environment variable in the run configurations: LC_CTYPE=en_US.UTF-8
Source: PyCharm is changing the default encoding in my Django app
Upvotes: 3
Reputation: 177951
If you want to read a UTF8 file, specify the encoding:
def read(file):
with open(file, encoding='utf8') as f:
content = f.read()
Upvotes: 1