Reputation: 440
I am sure this question has already been asked, so forgive me for the duplicate.
Python's chr()
function returns the unicode string representation of 1 ordinal value. How can I return a unicode string of a string of ordinals? For example:
john:
j - 106
o - 111
h - 104
n - 110
The full unicode string is: 106111104110
My current method is:
from textwrap import wrap
ct = "106111104110" # unicode string
Split = wrap(ct,3) # split into threes list
inInt = list(map(int, Split)) # convert list of string into list of int
answer=''.join([chr(num) for num in inInt]) # return unicode string for each 3 character string
print(answer)
The above works correctly, printing "john".
However this does not work when the unicode for the value is less than 3 characters, or less than 100. For example:
apple:
a - 97
p - 112
p - 112
l - 108
e - 101
The full unicode string is: 97112112108101
However doing:
ct="97112112108101"
Split = wrap(ct,3)
inInt = list(map(int, Split))
answer=''.join([chr(num) for num in inInt])
print(answer)
will print ϋyyQ
because the unicode of a
is 97, which is only 2 characters. I would like to not be constricted to using only characters over 100.
Is there a python library that has the functionality I am looking for? Many thanks in advance.
Upvotes: 0
Views: 522
Reputation: 177991
Unicode code points can be up to six hexadecimal digits or seven decimal digits, so you could use leading zeros for consistency:
>>> ''.join(format(ord(x),'06x') for x in 'john')
'00006a00006f00006800006e'
>>> ''.join(chr(int(_[i:i+6],16)) for i in range(0,len(_),6)) # _ gets previous result from REPL.
'john'
>>> ''.join(format(ord(x),'06x') for x in '你好吗')
'004f6000597d005417'
>>> ''.join(chr(int(_[i:i+6],16)) for i in range(0,len(_),6))
'你好吗'
However, typical encoding is performed on byte strings, so encode to UTF-8 first, then you can use bytes
methods to get two-digit hex strings:
>>> 'apple'.encode('utf8').hex()
'6170706c65'
>>> bytes.fromhex(_).decode()
'apple'
>>> '你好吗'.encode('utf8').hex()
'e4bda0e5a5bde59097'
>>> bytes.fromhex(_).decode('utf8')
'你好吗'
Upvotes: 2