jester112358
jester112358

Reputation: 465

Python lists with scandinavic letters

Can anyone explain what causes this for better understanding of the environment?

emacs, unix

input:

with open("example.txt", "r") as f:
    for files in f:
        print files
        split = files.split()
        print split

output:

Hello world
['Hello', 'world']
Hello wörld
['Hello', 'w\xf6rld']

Upvotes: 4

Views: 254

Answers (3)

kamjagin
kamjagin

Reputation: 3654

In python, lists are simply printed using unicode encoding. Basically printing a list makes the list calls __repr__ on each element (which results in a unicode print for strings). If you print each element by itself (in which case a strings __str__ method is used, rather than the list's) you get what you expect.

with open("example.txt", "r") as f:
    for inp in f:
        files = inp.decode('latin-1') // just to make sure this works on different systems
        print files
        split = files.split()
        print split
        print split[0]
        print split[1]

Output:

hello world

[u'hello', u'world']
hello
world
hello wörld
[u'hello', u'w\xf6rld']
hello
wörld

Upvotes: 2

Andreas Röhler
Andreas Röhler

Reputation: 4804

python-mode.el

After adapting the print-forms for Python3

py-execute-buffer-python3

prints nicely:

Hello world

['Hello', 'world']

Hello wörld

['Hello', 'wörld']

Upvotes: 0

Martijn Pieters
Martijn Pieters

Reputation: 1121594

Python is printing the string representation, which includes a non-printable byte. Non-printable bytes (anything outside the ASCII range or a control character) is displayed as an escape sequence.

The point is that you can copy that representation and paste it into Python code or into the interpreter, producing the exact same value.

The \xf6 escape code represents a byte with hex value F6, which when interpreted as a Latin-1 byte value, is the ö character.

You probably want to decode that value to Unicode to handle the data consistently. If you don't yet know what Unicode really is, or want to know anything else about encodings, see:

Upvotes: 10

Related Questions