Naseem
Naseem

Reputation: 21

How to map a arabic character to english string using python

I am trying to read a file that has Arabic characters like, 'ع ' and map it to English string "AYN". I want to create such a mapping of all 28 Arabic alphabets to English string in Python 3.4. I am still a beginner in Python and do not have much clue how to start. The file that has Arabic character is coded in UTF8 format.

Upvotes: 1

Views: 3151

Answers (3)

Roland Smith
Roland Smith

Reputation: 43495

Use unicodedata;

(note: This is Python 3. In Python 2 use u'ع' instead)

In [1]: import unicodedata

In [2]: unicodedata.name('a')
Out[2]: 'LATIN SMALL LETTER A'

In [6]: unicodedata.name('ع')
Out[6]: 'ARABIC LETTER AIN'

In [7]: unicodedata.name('ع').split()[-1]
Out[7]: 'AIN'

The last line works fine with simple letters, but not with all Arabic symbols. E.g. ڥ is ARABIC LETTER FEH WITH THREE DOTS BELOW.

So you could use;

In [26]: unicodedata.name('ڥ').lower().split()[2]
Out[26]: 'feh'

or

In [28]: unicodedata.name('ڥ').lower()[14:]
Out[28]: 'feh with three dots below'

For identifying characters use something like this (Python 3) ;

c = 'ع'
id = unicodedata.name(c).lower()
if 'arabic letter' in id:
    print("{}: {}".format(c, id[14:].lower()))

This would produce;

ع: ain

I'm filtering for the string 'arabic letter' because the arabic unicode block has a lot of other symbols as well.

A complete dictionary can be made with:

arabicdict = {}
for n in range(0x600, 0x700):
    c = chr(n)
    try:
        id =  unicodedata.name(c).lower()
        if 'arabic letter' in id:
            arabicdict[c] = id[14:]
    except ValueError:
        pass

Upvotes: 4

james-see
james-see

Reputation: 13176

Use a simple dictionary in python to do this properly. Make sure your file is set in the following way:

#!/usr/bin/python
# -*- coding: utf-8 -*-

Here is code that should work for you (I added in examples of how to get out the values from your dictionary as well, since you are a beginner):

exampledict = {unicode(('ا').decode('utf-8')):'ALIF',unicode(('ع').decode('utf-8')):'AYN'}
keys = exampledict.keys()
values = exampledict.values()
print(keys)
print(values)
exit()

Output:

[u'\u0639', u'\u0627']
['AYN', 'ALIF']

Hope this helps you on your journey learning python, it is fun!

Upvotes: 1

Malik Brahimi
Malik Brahimi

Reputation: 16711

Refer to the Unicode numbers for each character and then construct a dictionary as follows here:

arabic = {'alif': u'\u0623', 'baa': u'\u0628', ...} # use unicode mappings like so

Upvotes: 1

Related Questions