Reputation: 7916
I have a long list of unicode definitions and description mappings that use the 'U+1F49A'
coding convention.
In python (3), how can I read these in as true unicode characters? (i.e. '\u00001F49A'
or 'π'
I've tried array slicing and composition eg '\U000{}'.format('1F49A')
but end up with SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-4: truncated \UXXXXXXXX escape
as the initial string instantiation craps out on a partial unicode declaration.
Upvotes: 6
Views: 2506
Reputation: 168716
You can also use int()
to parse the number, and chr()
to convert the number to a single-character string.
For example:
In [8]: chr(0x1f49a)
Out[8]: 'π'
In [9]: s='U+1F49A'
In [10]: chr(int(s[2:], 16))
Out[10]: 'π'
If you want to convert all of the U+xxxx
instances in a larger string, you can use the same chr()
/int()
pattern in the 2nd arg of re.sub()
:
In [14]: s = 'U+1F49A -vs- U+2764'
In [15]: re.sub(r'U\+([0-9a-fA-F]+)', lambda m: chr(int(m.group(1),16)), s)
Out[15]: 'π -vs- β€'
Upvotes: 13