Reputation: 377
I'm using the below code to convert Arabic to Unicode UTF-16.
for example I have an Arabic text as مرحبا
unicode = ''.join([hex(ord(i)) for i in t.text])
this code provide Unicode string as 0x6450x6310x62d0x6280x627
The format in which I need Unicode is \u0645\u0631\u062d\u0628\u0627
I want to replicate this website
using the above method I'm using replace
method to convert 0x format to \u0 format but 0x format don't convert special characters as expected so I've to use replace method.
unicode = str(unicode).replace('0x', '\\u0')
unicode = str(unicode).replace('\\u020', ' ') #For Space
unicode = str(unicode).replace('\\u02e', '\\u002e') #For .
unicode = str(unicode).replace('\\u022', '\\u0022') #For "
unicode = str(unicode).replace('\\u07d', '\\u007d') #For }
unicode = str(unicode).replace('\\u030', '\\u0030') #For 0
unicode = str(unicode).replace('\\u07b', '\\u007b') #For {
unicode = str(unicode).replace('\\u031', '\\u0031') #For 1
Using the default python encoding, UTF-16 didn't provide encoding in \u0 format.
print("مرحبا".encode('utf-16'))
b"\xff\xfeE\x061\x06-\x06(\x06'\x06"
How can I get results in \u0 format as this this website is providing in UTF-16 format.
Thanks.
Upvotes: 0
Views: 587
Reputation: 33830
This problem is just about how you represent the hex value. To get the string in the representation you want, you can use
In [84]: text = "مرحبا"
In [85]: print(''.join([f'\\u{ord(c):0>4x}' for c in text]))
\u0645\u0631\u062d\u0628\u0627
Consider the first character of the text
:
In [86]: ord(text[0])
Out[86]: 1605
It has integer (decimal) value 1605. This in hex is 645:
In [87]: hex(ord(text[0]))
Out[87]: '0x645'
You can also use string formatting (for example f-strings in Python 3.6+) to show it as \u0645:
In [88]: f'\\u{ord(text[0]):0>4x}'
Out[88]: '\\u0645'
The x
in the format string means "hex". The 0>4
means that print it as 4-digit number, and pad it with zeroes.
Upvotes: 2