Alekz112
Alekz112

Reputation: 155

Problems with decoding from command prompt [python]

I have the following problem. I have a german text saved in .txt UTF-8 format, and I'd like to print it out with python. Here's my code:

txt = open(filename, 'r').read()
print txt.decode('utf-8-sig')

It works perfectly in IDLE, but when I save my code and run it from the command prompt, it raises error, specifically:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 3-4: cha
racter maps to <undefined>

In my particular case, the text is "gemäßigt", and in the beginning of .py code I put something like

# -*- coding: utf-8-sig -*-

By the way, my OS is Windows, in Russian. Does anybody have an idea what is my problem?

Best, Alex

Upvotes: 1

Views: 1003

Answers (2)

Walle Cyril
Walle Cyril

Reputation: 3247

Is your text in UTF-8 or utf-8-sig ? It's not exaclty the same. Here you can learn the difference. https://docs.python.org/3/library/codecs.html#encodings-and-unicode

You can also open text file already decoded with

import codecs
txt = codecs.open(filename,'r',"utf-8-sig").read()

I think Tim is correct about the console problem.

Upvotes: 0

Tim Pietzcker
Tim Pietzcker

Reputation: 336168

Your console uses the DOS codepage 866 which doesn't have the character symbols for ä or ß, causing the error.

You could try .encoding('cp866', errors='replace') your string before output, replacing all the characters not supported by your terminal by ?s.

Upvotes: 1

Related Questions