Reputation:
I have this list:
l = [u'\xf9', u'!']
And I want to convert it in this list:
l2 = ['ù','!']
How can i do it? and Why does l.encode() not work?
Upvotes: 1
Views: 628
Reputation: 52000
Are you using Python 2 ? If it is the case, you might be fooled by the way Python displays strings.
As you noticed, '\xc3\xb9'
is the UTF-8 encoded representation of code point U+00F9 ('ù'
). So:
# code point
>>> u'ù'
u'\xf9'
# seems wrong ?
>>> u'ù'.encode('utf-8')
'\xc3\xb9'
# No, not at all (at least on my UTF-8 terminal)
>>> print(u'ù'.encode('utf-8'))
ù
Given your example:
>>> l = [u'\xf9', u'!']
>>> print(l)
[u'\xf9', u'!']
>>> l[0]
u'\xf9'
>>> print(l[0])
ù
>>> l2 = [u.encode('utf-8') for u in l]
>>> l2
['\xc3\xb9', '!']
>>> print(l2)
['\xc3\xb9', '!']
>>> print(l2[0])
ù
I agree all of this is rather inconsistent and source of frustration. That's why string/unicode support was a major rewrite in Python 3. There:
# Python 3
>>> l = [u'\xf9', u'!']
>>> l
['ù', '!']
Upvotes: 1