Converting a unicode string to see unicode code points

Question

I have this:

>>> su = u'"/\"'

In python, how can I convert this to a representation that shows the unicode code points? That would be this for the string above

u'\u0022\u002F\u005C\u0022'

Mark Tolonen · Accepted Answer

Your original string is not four characters but three because " is an escape code for a double quote:

>>> su = u'"/"'
>>> len(su)
3

Here's how to display it as escape codes:

>>> ''.join(u'\u{:04X}'.format(ord(c)) for c in su)
u'\u0022\u002F\u0022'

Use a Unicode raw string, or double backslashes to escape the slash and get four characters:

>>> su = ur'"/"' # Raw version
>>> ''.join(u'\u{:04X}'.format(ord(c)) for c in su)
u'\u0022\u002F\u005C\u0022'

>>> su = u'"/\"' # Escaped version
>>> ''.join(u'\u{:04X}'.format(ord(c)) for c in su)
u'\u0022\u002F\u005C\u0022'

Note the double backslash in the result. This indicates it is a single literal backslash. with one backslash, they would be escape codes...no different from your original string:

>>> ur'"/"' == u'\u0022\u002F\u005C\u0022'
True

Printing it shows the content of the strings:

>>> print u'\u0022\u002F\u005C\u0022'
"/"
>>> print(''.join(u'\u{:04X}'.format(ord(c)) for c in su))
\u0022\u002F\u005C\u0022

Converting a unicode string to see unicode code points

Answers (2)

Output

Related Questions