Be Chiller Too
Be Chiller Too

Reputation: 2900

Decode a Python string

Sorry for the generic title.

I am receiving a string from an external source: txt = external_func()

I am copying/pasting the output of various commands to make sure you see what I'm talking about:

In [163]: txt
Out[163]: '\\xc3\\xa0 voir\\n'

In [164]: print(txt)
\xc3\xa0 voir\n

In [165]: repr(txt)
Out[165]: "'\\\\xc3\\\\xa0 voir\\\\n'"

I am trying to transform that text to UTF-8 (?) to have txt = "à voir\n", and I can't see how.

How can I do transformations on this variable?

Upvotes: 2

Views: 116

Answers (1)

kalehmann
kalehmann

Reputation: 5011

You can encode your txt to a bytes-like object using the encode-method of the str class. Then this byte-like object can be decoded again with the encoding unicode_escape.

Now you have your string with all escape sequences parsed, but latin-1 decoded. You still have to encode it with latin-1 and then decode it again with utf-8.

>>> txt = '\\xc3\\xa0 voir\\n'
>>> txt.encode('utf-8').decode('unicode_escape').encode('latin-1').decode('utf-8')
'à voir\n'

The codecs module also has an undocumented funciton called escape_decode:

>>> import codecs
>>> codecs.escape_decode(bytes('\\xc3\\xa0 voir\\n', 'utf-8'))[0].decode('utf-8')
'à voir\n'

Upvotes: 3

Related Questions