Reputation: 179
I have a list like:
print alist
['G\xc3\xbcnther', 'Santher']
And want to change it to:
['Günther', 'Santher']
I tried a lot of stuff like:
alist=[s.encode("utf-8") for s in alist]
print alist
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
In others the word Günther gets lost, or G\xc3\xbcnther stays the same. What am i doing wrong?
Upvotes: 0
Views: 214
Reputation: 9402
Everything works fine here, you are just assuming the wrong thing from the API.
Printing a object other that a string, converts it to a string first. In that case, the list is converted to a string representing the Python expression that, when entered, would evaluate to an equal list. This is the most useful way to display a list: you see exactly what is in there, sometimes it's just escaped.
Compare:
>>> a = ['test\'test\"test', 0, '0']
>>> print a[0]
test'test"test
>>> print a
['test\'test"test', 0, '0']
The letter ü
is encoded in UTF-8 as two bytes: \xc3\xbc
. Therefore, if you print the string 'G\xc3\xbcnther'
in a UTF-8 terminal, you will see Günther
. If you save it to a file and open that file in a decent text editor, it will display Günther
(maybe you'll have to poke the encoding setting a bit). For all intents and purposes, this is the best way to store the word “Günther” in a bytestring.
If you want to print a list in a nice manner, it's up to you to format it yourself. For example, if it's a list of strings, like in your example, join
would work nice:
>>> print '; '.join(['G\xc3\xbcnther', 'Santher'])
Günther; Santher
(By the way: you can't encode a bytestring, it's already encoded. You can, however, decode it.)
Upvotes: 2
Reputation: 11
your code displays a representational form to see that in string form use this:
print alist[0]
python saves unicode characters as same, no way to change this :)
Upvotes: 1