Reputation: 29

Unicode in Python for Kannada language

I am working with Python 2.7 and typed the following code:

print u'\u0cb5\u0ccd\u0c87'

Since my Unicode string contains the Kannada consonant "v" followed by the Kannada vowel "i", I expected the output to be a single Kannada character representing the syllable/akshara "vi", but instead I got ವ್ಇ. How can I fix this and instead get the character for "vi"?

Upvotes: 2

Answers (3)

Louis

Reputation: 151401

I believe you've not encoded your string properly. I expect this is what you want:

>>> print u'\u0cb5\u0CBF'
ವಿ

What you did was to output (using the full names that Unicode assigns to these characters):

KANNADA LETTER VA
KANNADA SIGN VIRAMA
KANNADA LETTER I

I can see the logic in this but that's not how Unicode works. The virama should be used only for consonant clusters or if you have a sequence that ends in a consonant. To combine syllables with vowels you have to use the syllable together with a combining form of the vowel:

KANNADA LETTER VA
KANNADA VOWEL SIGN I

The KANNADA VOWEL SIGN I is a combining form of the letter "I" whereas KANNADA LETTER I is a non-combining form of the same letter.

I suggest reading the chapter 9 of the Unicode standard for a complete explanation of how to deal with South-Asian scripts. Chapter 10 can also be useful.

Upvotes: 4