Hamad
Hamad

Reputation: 383

How to convert Arabic text to number in Python

Using Python, I'm trying to write a simple code were it converts Arabic text to numbers. The code I used can be found here and I'm trying to adapt it from English to Arabic. From unknown reason, it doesn't seem to work very well:

      def text2int(textnum, numwords={}):
        if not numwords:
            units = [
    "", "واحد", "اثنان", "ثلاثة", "أربعة", "خمسة", "ستة", "سبعة", "ثمانية",                          
    "تسعة",                           
    "عشرة", "أحد عشر", "اثنا عشر", "ثلاثة عشر", "أربعة عشر", "خمسة عشر",                           
    "ستة عشر", "سبعة عشر", "ثمانية عشر",                           
    "تسعة عشر"                          
                        ]

            tens = [
            "عشرون", "ثلاثون", "أربعون", "خمسون", "ستون", "سبعون", "ثمانون",                    
            "تسعون"                    
                    ]

            scales = ["مية", "الف", "مليون", "مليار", "ترليون"]

            numwords["و"] = (
                             1, 0)
            for idx, word in enumerate(units):    numwords[word] = (1, idx)
            for idx, word in enumerate(tens):     numwords[word] = (1, idx * 10)
            for idx, word in enumerate(scales):   numwords[word] = (10 ** (idx * 3 or 2), 0)

        current = result = 0
        for word in textnum.split():
            if word not in numwords:
              raise Exception("Illegal word: " + word)

            scale, increment = numwords[word]
            current = current * scale + increment
            if scale > 100:
                result += current
                current = 0

        return result + current

    print (text2int("خمسة و عشرون"))

The output of the method that I get is 5, which is completely wrong and it should be 25. Is there a way I could solve this? Also, the scales are not working at all.

Upvotes: 1

Views: 971

Answers (2)

Vaibhav Jadhav
Vaibhav Jadhav

Reputation: 2086

Just do below changes in your code:

for idx, word in enumerate(tens):
    numwords[word] = (1, (idx+2) * 10)

Upvotes: 1

Rotem Tal
Rotem Tal

Reputation: 769

Try changing ur tens variable as such

tens = ["", "", 
            "عشرون", "ثلاثون", "أربعون", "خمسون", "ستون", "سبعون", "ثمانون",                    
            "تسعون"  ]

That is adding 2 empty strings, alternatively, you could change this line as such:

for idx, word in enumerate(tens):     numwords[word] = (1, (idx + 2) * 10)

as someone suggested in the comments, only add the parentheses around idx+2

Upvotes: 1

Related Questions