charvi
charvi

Reputation: 211

how to traverse a unicode tamil word character by charcter in python?

I want to know how many characters are there in a Unicode string (Tamil) and then check the character1 and character2 for particular occurrences.
I am able to split the word into characters, but I do not know how to traverse through them character by character using the word length.

Example : word : "எஃகு".
It should return no of characters as 3, and I should be able to print word[0] as 'எ', word[1] as 'ஃ' and word[2] as 'கு'.

I want to check like:

    if word[0] is a vowel:
        if word[1] is "ஃ":
           then print word[0]+word[1]+word[3] (as எஃகு)
        else:
           print word[0] 

I want to traverse using no of characters, if no.of.char is 3, then i=0 should help me process 'எ'.
I saw many questions related to Unicode character processing and length processing. But they all either return byte length or give varying results. So am confused.

Code that I use for splitting them character-wise:

    for line in f.readlines():
       letters = utf8.get_letters(line)
       for letter in letters:
          ff.write(unicode(letter))
          ff.write(' ')

Sample Input File:

அன்று
அதாவது
அஃதான்று

Sample Output File:

அ ன் று
அ தா வ து
அ ஃ தா ன் று

Upvotes: 2

Views: 3392

Answers (1)

Amadan
Amadan

Reputation: 198446

Package

pip install Open-Tamil

Code

from tamil import utf8
string = u"எஃகு"
letters = utf8.get_letters(string)
print(len(letters))
# 3. Not 4. 
print(letters)
# [u'\u0b8e', u'\u0b83', u'\u0b95\u0bc1']
for letter in letters:
    print(letter)
# எ
# ஃ
# கு

Upvotes: 5

Related Questions