Reputation: 29
I am working with Python 2.7 and typed the following code:
print u'\u0cb5\u0ccd\u0c87'
Since my Unicode string contains the Kannada consonant "v" followed by the Kannada vowel "i", I expected the output to be a single Kannada character representing the syllable/akshara "vi", but instead I got ವ್ಇ. How can I fix this and instead get the character for "vi"?
Upvotes: 2
Views: 3089
Reputation: 151401
I believe you've not encoded your string properly. I expect this is what you want:
>>> print u'\u0cb5\u0CBF'
ವಿ
What you did was to output (using the full names that Unicode assigns to these characters):
I can see the logic in this but that's not how Unicode works. The virama should be used only for consonant clusters or if you have a sequence that ends in a consonant. To combine syllables with vowels you have to use the syllable together with a combining form of the vowel:
The KANNADA VOWEL SIGN I is a combining form of the letter "I" whereas KANNADA LETTER I is a non-combining form of the same letter.
I suggest reading the chapter 9 of the Unicode standard for a complete explanation of how to deal with South-Asian scripts. Chapter 10 can also be useful.
Upvotes: 4
Reputation: 2958
If you don't use font that have kannada symbols you'll get boxes.
Got this after installing lohit-fonts on my gentoo box:
.
Upvotes: -1
Reputation:
0cb5
is Unicode Character 'KANNADA LETTER VA' (U+0CB5) ವ . So Python is correct to print ವ್ಇ.
Upvotes: 0