abdel-kader Magdy
abdel-kader Magdy

Reputation: 85

Arabic words in Python

I have a problem to print an Arabic text in Python, I write a code with convert English characters into Arabic ones as is called (chat language or Franco Arabic) and then create a combination between different results to get suggestions based on user input.

def transliterate(francosentence, verbose=False):
    francowords = francosentence.split()
    arabicconvertedwords = []
    for i in francowords:
        rankeddata=[]
        rankeddata=transliterate_word(i)
        arabicconvertedwords.append(rankeddata)
        for index in range(len(rankeddata)):
            print rankeddata[index]

    ran=list(itertools.product(*arabicconvertedwords))
    for I in range(len(ran)):
        print ran[I]

The first print (print rankeddata[index]) gives Arabic words, but after the combination process is executed the second print (print ran[I]) gives something like that: (u'\u0627\u0646\u0647', u'\u0631\u0627\u064a\u062d', u'\u0627\u0644\u062c\u0627\u0645\u0639\u0647')

How can I print Arabic words?

Upvotes: 1

Views: 1439

Answers (1)

ShadowRanger
ShadowRanger

Reputation: 155584

Your second loop is operating over tuples of unicode (product yields a single product at a time as a tuple), not individual unicode values.

While print uses the str form of the object printed, tuple's str form uses the repr of the contained objects, it doesn't propagate "str-iness" (technically, tuple lacks __str__ entirely, so it's falling back to __repr__).

If you want to see the Arabic, you need to print the elements individually or concatenate them so you're printing strings, not tuple. For example, you could change:

print ran[I]

to something like:

print u', '.join(ran[I])

which will convert to a single comma-separated unicode value that print will format as expected (the str form), rather than using the repr form with escapes for non-ASCII values.

Side-note: As a point of style (and memory use), use the iterator protocol directly, don't listify everything then use C-style indexing loops. The following code has to store a ton of stuff in memory if the inputs are large (the total size of the output is the multiplicative product of the lengths of each input):

ran=list(itertools.product(*arabicconvertedwords))
for I in range(len(ran)):
    print u', '.join(ran[I])

where it could easily produce just one item at a time on demand, producing results faster with no memory overhead:

# Don't listify...
ran = itertools.product(*arabicconvertedwords)
for r in ran:  # Iterate items directly, no need for list or indexing
    print u', '.join(r)

Upvotes: 3

Related Questions