VIVEKANANDA
VIVEKANANDA

Reputation: 29

Unicode in Python for Kannada language

I am working with Python 2.7 and typed the following code:

print u'\u0cb5\u0ccd\u0c87'

Since my Unicode string contains the Kannada consonant "v" followed by the Kannada vowel "i", I expected the output to be a single Kannada character representing the syllable/akshara "vi", but instead I got ವ್ಇ. How can I fix this and instead get the character for "vi"?

Upvotes: 2

Views: 3089

Answers (3)

Louis
Louis

Reputation: 151401

I believe you've not encoded your string properly. I expect this is what you want:

>>> print u'\u0cb5\u0CBF'
ವಿ

What you did was to output (using the full names that Unicode assigns to these characters):

  • KANNADA LETTER VA
  • KANNADA SIGN VIRAMA
  • KANNADA LETTER I

I can see the logic in this but that's not how Unicode works. The virama should be used only for consonant clusters or if you have a sequence that ends in a consonant. To combine syllables with vowels you have to use the syllable together with a combining form of the vowel:

  • KANNADA LETTER VA
  • KANNADA VOWEL SIGN I

The KANNADA VOWEL SIGN I is a combining form of the letter "I" whereas KANNADA LETTER I is a non-combining form of the same letter.

I suggest reading the chapter 9 of the Unicode standard for a complete explanation of how to deal with South-Asian scripts. Chapter 10 can also be useful.

Upvotes: 4

graphite
graphite

Reputation: 2958

If you don't use font that have kannada symbols you'll get boxes.

Got this after installing lohit-fonts on my gentoo box:

after fonts installed.

Upvotes: -1

user1907906
user1907906

Reputation:

0cb5 is Unicode Character 'KANNADA LETTER VA' (U+0CB5) ವ . So Python is correct to print ವ್ಇ.

Upvotes: 0

Related Questions