Reputation: 155
I have the following problem. I have a german text saved in .txt UTF-8 format, and I'd like to print it out with python. Here's my code:
txt = open(filename, 'r').read()
print txt.decode('utf-8-sig')
It works perfectly in IDLE, but when I save my code and run it from the command prompt, it raises error, specifically:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 3-4: cha
racter maps to <undefined>
In my particular case, the text is "gemäßigt", and in the beginning of .py code I put something like
# -*- coding: utf-8-sig -*-
By the way, my OS is Windows, in Russian. Does anybody have an idea what is my problem?
Best, Alex
Upvotes: 1
Views: 1003
Reputation: 3247
Is your text in UTF-8 or utf-8-sig ? It's not exaclty the same. Here you can learn the difference. https://docs.python.org/3/library/codecs.html#encodings-and-unicode
You can also open text file already decoded with
import codecs
txt = codecs.open(filename,'r',"utf-8-sig").read()
I think Tim is correct about the console problem.
Upvotes: 0
Reputation: 336168
Your console uses the DOS codepage 866 which doesn't have the character symbols for ä
or ß
, causing the error.
You could try .encoding('cp866', errors='replace')
your string before output, replacing all the characters not supported by your terminal by ?
s.
Upvotes: 1