Reputation: 2900
Sorry for the generic title.
I am receiving a string from an external source: txt = external_func()
I am copying/pasting the output of various commands to make sure you see what I'm talking about:
In [163]: txt
Out[163]: '\\xc3\\xa0 voir\\n'
In [164]: print(txt)
\xc3\xa0 voir\n
In [165]: repr(txt)
Out[165]: "'\\\\xc3\\\\xa0 voir\\\\n'"
I am trying to transform that text to UTF-8 (?) to have txt = "à voir\n"
, and I can't see how.
How can I do transformations on this variable?
Upvotes: 2
Views: 116
Reputation: 5011
You can encode your txt
to a bytes-like object using the encode-method of the str class.
Then this byte-like object can be decoded again with the encoding unicode_escape
.
Now you have your string with all escape sequences parsed, but latin-1
decoded. You still have to encode it with latin-1
and then decode it again with utf-8
.
>>> txt = '\\xc3\\xa0 voir\\n'
>>> txt.encode('utf-8').decode('unicode_escape').encode('latin-1').decode('utf-8')
'à voir\n'
The codecs
module also has an undocumented funciton called escape_decode
:
>>> import codecs
>>> codecs.escape_decode(bytes('\\xc3\\xa0 voir\\n', 'utf-8'))[0].decode('utf-8')
'à voir\n'
Upvotes: 3