Encoding Characters not working on a list?

Question

I have a list like:

print alist    
['G\xc3\xbcnther', 'Santher']

And want to change it to:

['Günther', 'Santher']

I tried a lot of stuff like:

alist=[s.encode("utf-8") for s in alist]
print alist
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

In others the word Günther gets lost, or G\xc3\xbcnther stays the same. What am i doing wrong?

Karol S · Accepted Answer

Everything works fine here, you are just assuming the wrong thing from the API.

Printing a object other that a string, converts it to a string first. In that case, the list is converted to a string representing the Python expression that, when entered, would evaluate to an equal list. This is the most useful way to display a list: you see exactly what is in there, sometimes it's just escaped.

Compare:

>>> a = ['test\'test\"test', 0, '0']

>>> print a[0]
test'test"test

>>> print a
['test\'test"test', 0, '0']

The letter ü is encoded in UTF-8 as two bytes: \xc3\xbc. Therefore, if you print the string 'G\xc3\xbcnther' in a UTF-8 terminal, you will see Günther. If you save it to a file and open that file in a decent text editor, it will display Günther (maybe you'll have to poke the encoding setting a bit). For all intents and purposes, this is the best way to store the word “Günther” in a bytestring.

If you want to print a list in a nice manner, it's up to you to format it yourself. For example, if it's a list of strings, like in your example, join would work nice:

>>> print '; '.join(['G\xc3\xbcnther', 'Santher'])
Günther; Santher

(By the way: you can't encode a bytestring, it's already encoded. You can, however, decode it.)

Encoding Characters not working on a list?

Answers (2)

Related Questions