Calaf
Calaf

Reputation: 10817

Why do unicode characters appear encoded within lists?

The output of the program

# -*- coding: utf-8 -*-
j = "Jürgen"
jlist = [j]
print j, type(j)
print jlist, type(jlist)

is

Jürgen <type 'str'>
['J\xc3\xbcrgen'] <type 'list'>

There is nothing wrong here. \xc3\xbc is just the utf-8 encoding of ü. What I'm trying to understand is the difference. Why does the OS X terminal (which otherwise handles utf-8-encoded unicode just fine) and the debugger (PyCharm) display the encoding within the list, but display the actual (un-encoded) character without?

Upvotes: 2

Views: 59

Answers (1)

zondo
zondo

Reputation: 20336

Because print() uses str()(pretty printing) to display its strings, str(j) will appear with the strange character. str(jlist), however, will get the string version of the list. The list's __str__ method gets its strings by using repr() on each. repr() is the raw format. That means that a tab will be displayed as \t, not as a bunch of spaces; a new line will be displayed as \n, not as a new line, etc. The reason for that is that if you wanted to be printing a list, it is probably for debugging or testing. In those cases, you really want to know what is going on in the background.

Upvotes: 2

Related Questions