representing \xd9\x88 (UTF-8 literal) in its arabic character و in Python

Question

I know that \xd9\x88is utf-8 code for the letter و in Arabic (you can see this page).

I have a file that contains a list of such utf-8 characters, how can I represent them in Arabic characters, for example represent \xd9\x88 with و ?

On Python 3 if I do:

>>> i = '\xd9\x88'
>>> print(i)
Ù

Burhan Khalid · Accepted Answer

If you want to print the character, simply use print(); but you have to make sure your terminal supports the encoding and is using a font that has that glyph.

In the Windows command prompt, with the default encoding (that doesn't support Arabic), you'll see this:

Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> i = "\xd9\x88"
>>> print(i)
┘ê
>>>

On Linux, using UTF-8 as the default encoding and using a font that has the Arabic glyphs, you'll see this:

>>> i = "\xd9\x88"
>>> print(i)
و
>>>

Back on Windows, if you use a text editor that supports UTF-8 (in this case, I am using Sublime Text), you'll see:

sublime-text-screencap

I am using IDLE for Python and Python 3 on Windows.

Python 3 introduced some major changes to how strings are handled in Python. In Python 3, all strings are stored as unicode.

You actually have a byte string, a string representing the code points that represent a character. So you need to decode it properly.

You can do this two ways, first is to make sure its a byte string to start with:

>>> i = b"\xd9\x88"
>>> print(i.decode('utf-8'))
و

Or, you can encode it to latin-1 first, which will give you a bytestring, then decode it:

>>> i = "\xd9\x88"
>>> type(i)

>>> type(i.encode('latin-1'))

>>> print(i.encode('latin-1').decode('utf-8'))
و

representing \xd9\x88 (UTF-8 literal) in its arabic character و in Python

Answers (1)

Related Questions