Reputation: 238
I'm decoding an xml file with xml.etree and one of the elements contains this string:
Exécutive
I tried pretty much everything to figure out how to tranform it to its real value:
Exécutive
I tried the following:
>>> s = 'é'
>>> s
'\xc3\x83\xc2\xa9'
>>> print s
é
>>> type(s)
<type 'str'>
>>> s.decode('iso-8859-1')
u'\xc3\x83\xc2\xa9'
>>> print( s.decode('iso-8859-1').encode('utf-8'))
é
>>> print( s.decode('utf-8'))
é
I'm kind of lost here with these encodings. Anyone for a little help?
Thanks in advance
Upvotes: 2
Views: 3233
Reputation: 201618
The data is apparently UTF-8 encoded data (e.g., “é” is two bytes) misinterpreted as ISO-8859-1. For the test case, the following produces the output “Exécutive”:
# This Python file uses the following encoding: utf-8
s = 'Exécutive'
print s.decode('utf-8')
In processing the XML file, you probably just need to read it as UTF-8.
Upvotes: 2