Reputation: 8468
I know that \xd9\x88
is utf-8 code for the letter و
in Arabic (you can see this page).
I have a file that contains a list of such utf-8 characters, how can I represent them in Arabic characters, for example represent \xd9\x88
with و
?
On Python 3 if I do:
>>> i = '\xd9\x88'
>>> print(i)
Ù
Upvotes: 0
Views: 5079
Reputation: 174624
If you want to print the character, simply use print()
; but you have to make sure your terminal supports the encoding and is using a font that has that glyph.
In the Windows command prompt, with the default encoding (that doesn't support Arabic), you'll see this:
Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> i = "\xd9\x88"
>>> print(i)
و
>>>
On Linux, using UTF-8 as the default encoding and using a font that has the Arabic glyphs, you'll see this:
>>> i = "\xd9\x88"
>>> print(i)
و
>>>
Back on Windows, if you use a text editor that supports UTF-8 (in this case, I am using Sublime Text), you'll see:
I am using IDLE for Python and Python 3 on Windows.
Python 3 introduced some major changes to how strings are handled in Python. In Python 3, all strings are stored as unicode.
You actually have a byte string, a string representing the code points that represent a character. So you need to decode it properly.
You can do this two ways, first is to make sure its a byte string to start with:
>>> i = b"\xd9\x88"
>>> print(i.decode('utf-8'))
و
Or, you can encode it to latin-1 first, which will give you a bytestring, then decode it:
>>> i = "\xd9\x88"
>>> type(i)
<class 'str'>
>>> type(i.encode('latin-1'))
<class 'bytes'>
>>> print(i.encode('latin-1').decode('utf-8'))
و
Upvotes: 4