Reputation: 383
Using Python, I'm trying to write a simple code were it converts Arabic text to numbers. The code I used can be found here and I'm trying to adapt it from English to Arabic. From unknown reason, it doesn't seem to work very well:
def text2int(textnum, numwords={}):
if not numwords:
units = [
"", "واحد", "اثنان", "ثلاثة", "أربعة", "خمسة", "ستة", "سبعة", "ثمانية",
"تسعة",
"عشرة", "أحد عشر", "اثنا عشر", "ثلاثة عشر", "أربعة عشر", "خمسة عشر",
"ستة عشر", "سبعة عشر", "ثمانية عشر",
"تسعة عشر"
]
tens = [
"عشرون", "ثلاثون", "أربعون", "خمسون", "ستون", "سبعون", "ثمانون",
"تسعون"
]
scales = ["مية", "الف", "مليون", "مليار", "ترليون"]
numwords["و"] = (
1, 0)
for idx, word in enumerate(units): numwords[word] = (1, idx)
for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)
current = result = 0
for word in textnum.split():
if word not in numwords:
raise Exception("Illegal word: " + word)
scale, increment = numwords[word]
current = current * scale + increment
if scale > 100:
result += current
current = 0
return result + current
print (text2int("خمسة و عشرون"))
The output of the method that I get is 5, which is completely wrong and it should be 25. Is there a way I could solve this? Also, the scales are not working at all.
Upvotes: 1
Views: 971
Reputation: 2086
Just do below changes in your code:
for idx, word in enumerate(tens):
numwords[word] = (1, (idx+2) * 10)
Upvotes: 1
Reputation: 769
Try changing ur tens
variable as such
tens = ["", "",
"عشرون", "ثلاثون", "أربعون", "خمسون", "ستون", "سبعون", "ثمانون",
"تسعون" ]
That is adding 2 empty strings, alternatively, you could change this line as such:
for idx, word in enumerate(tens): numwords[word] = (1, (idx + 2) * 10)
as someone suggested in the comments, only add the parentheses around idx+2
Upvotes: 1