volting
volting

Reputation: 18967

Python, Source-Code Encoding Problem

I'm using Notepad++ editor on windows with format set to ASCII, I've read "PEP 263: Source Code Encodings" and amended my code accordingly (I think), but there are characters still printing in hex...

#!/usr/bin/python
# -*- coding: UTF-8 -*-
import os, sys

a_munge = [ "A", "4", "/\\", "\@", "/-\\", "^", "aye", "?" ]
b_munge = [ "B", "8", "13", "I3", "|3" , "P>", "|:", "!3", "(3", "/3", "3","]3" ]
c_munge = [ "C", "<", "(", "{", "(c)" ]
d_munge = [ "D", "|)", "|o", "?", "])", "[)", "I>", "|>", " ?", "T)", "0", "cl" ]
e_munge = [ "E", "3", "&", "€", "£", "[-", "|=-", "?" ]
         .
         .
         .

Upvotes: 0

Views: 761

Answers (3)

John Machin
John Machin

Reputation: 82924

print some_list is in effect print repr(some_list) -- that's why you see \u20ac instead of a Euro character. For debugging purposes, the "unicode hex" is exactly what you need for unambiguous display of your data.

You appear to have perfectly OK unicode objects in your list; I suggest that you don't "print" the list to Tkinter.

Upvotes: 1

Mark Tolonen
Mark Tolonen

Reputation: 177461

The line:

# -*- coding: UTF-8 -*-

declares that the source file is saved in UTF-8. Anything else is an error.

When you declare byte strings in your source code:

e_munge = [ "E", "3", "&", "€", "£", "[-", "|=-", "?" ]

then byte strings like "€" will actually contain the encoded bytes used to save the source file.

When you use Unicode strings instead:

    e_munge = [ u"E", u"3", u"&", u"€", u"£", u"[-", u"|=-", u"?" ]

then when u followed by the byte-string "€" is read by Python from a source file, it uses the declared encoding to decode that character into Unicode.

An illustration:

# coding: utf-8
bs = '€'
us = u'€'
print repr(bs)
print repr(us)

OUTPUT:

'\xe2\x82\xac'
u'\u20ac'

Upvotes: 2

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798486

Perhaps you should be using unicode literals (e.g. u'€') instead.

Upvotes: 2

Related Questions