How can I generate \x-escaped UTF-8 in Python?

Question

I want to convert a unicode input to a \x escaped, 7-bit-ascii-clean reprentation of a UTF-8 byte sequence.

This is analogous to what I need, but instead of "\u222a" I would like to generate "\xe2\x88\xaa"

>>> codecs.encode(u"\u222A", 'ascii', 'backslashreplace')
'\u222a'

This looks like it is generating the desired result:

>>> u"\u222A".encode('utf-8')
'\xe2\x88\xaa'

But that is merely an escaped representation. The actual result isn't 12 ascii bytes, it's 3 UTF-8 bytes:

>>> [ord(c) for c in u"\u222A".encode('utf-8')]
[226, 136, 170]

I could abuse that escaped representation to get what I want, stripping off the leading and trailing quote that repr adds:

>>> repr(u"\u222A".encode('utf-8'))[1:-1]
'\xe2\x88\xaa'
>>> [ord(c) for c in repr(u"\u222A".encode('utf-8'))[1:-1]]
[92, 120, 101, 50, 92, 120, 56, 56, 92, 120, 97, 97]

Yuck. This is a little better:

>>> import binascii
>>> ''.join('\x' + binascii.hexlify(c) for c in u"\u222A".encode('utf-8'))
'\xe2\x88\xaa'
>>> [ord(c) for c in ''.join('\x' + binascii.hexlify(c) for c in u"\u222A".encode('utf-8'))]
[92, 120, 101, 50, 92, 120, 56, 56, 92, 120, 97, 97]

Is a better way to do this?

Ignacio Vazquez-Abrams · Accepted Answer

>>> u'\u222A'.encode('utf-8').encode('string-escape')
'\xe2\x88\xaa'
>>> print u'\u222A'.encode('utf-8').encode('string-escape')
\xe2\x88\xaa

How can I generate \x-escaped UTF-8 in Python?

Answers (2)

Related Questions