Ankur Agarwal
Ankur Agarwal

Reputation: 24768

Converting a unicode string to see unicode code points

I have this:

>>> su = u'"/\"'

In python, how can I convert this to a representation that shows the unicode code points? That would be this for the string above

u'\u0022\u002F\u005C\u0022'

Upvotes: 0

Views: 743

Answers (2)

jfs
jfs

Reputation: 414335

To support the full Unicode range, you could use unicode-escape to get the text representation. To represent characters in the ascii range as the unicode escapes too and to force \u00xx representation even for u'\xff', you could use a regex:

#!/usr/bin/env python2
import re

su = u'"/"\U000af600'
assert u'\ud800' not in su # no lone surrogate
print re.sub(ur'[\x00-\xff]', lambda m: u"\ud800u%04x" % ord(m.group()), su, 
             flags=re.U).encode('unicode-escape').replace('\\ud800', '\\')

a lone surrogate (U+d800) is used to avoid escaping the backslash twice.

Output

\u0022\u002f\u0022\U000af600

Upvotes: 1

Mark Tolonen
Mark Tolonen

Reputation: 177735

Your original string is not four characters but three because \" is an escape code for a double quote:

>>> su = u'"/\"'
>>> len(su)
3

Here's how to display it as escape codes:

>>> ''.join(u'\\u{:04X}'.format(ord(c)) for c in su)
u'\\u0022\\u002F\\u0022'

Use a Unicode raw string, or double backslashes to escape the slash and get four characters:

>>> su = ur'"/\"' # Raw version
>>> ''.join(u'\\u{:04X}'.format(ord(c)) for c in su)
u'\\u0022\\u002F\\u005C\\u0022'

>>> su = u'"/\\"' # Escaped version
>>> ''.join(u'\\u{:04X}'.format(ord(c)) for c in su)
u'\\u0022\\u002F\\u005C\\u0022'

Note the double backslash in the result. This indicates it is a single literal backslash. with one backslash, they would be escape codes...no different from your original string:

>>> ur'"/\"' == u'\u0022\u002F\u005C\u0022'
True

Printing it shows the content of the strings:

>>> print u'\u0022\u002F\u005C\u0022'
"/\"
>>> print(''.join(u'\\u{:04X}'.format(ord(c)) for c in su))
\u0022\u002F\u005C\u0022

Upvotes: 5

Related Questions