Mp0int
Mp0int

Reputation: 18727

Changing case of letters in unicode string containing accent and local letters

Python string and unicode objects have following methods for string case conversion.

Using unicode strings, I can handle nearly all characters in my local alphabet:

test_str = u"ças şak ürt örkl"
print test_str.upper()
>> ÇAS ŞAK ÜRT ÖRKL

Except two letters. Since I am living in Turkey, I have typical Turkish I problem.

In my local alphabet, we have a letter İ which is similar to I and their case conversion must be like following

I → lowercase → ı

i → uppercase → İ

And yes, it spoils ASCII conversion of i --> I since i and I are two separate letters.

test_str = u"ik"
print test_str.upper()
>> IK  # Wrong! must be İK
test_str = u"IK"
print test_str.lower()
>> ik  # Wrong! must be ık

How can I overcome this? Is there a way to handle case conversions correctly with using python build-ins?

Upvotes: 6

Views: 870

Answers (1)

bobince
bobince

Reputation: 536429

Python currently doesn't have any support for locale-specific case folding, or the other rules in Unicode SpecialCasing.txt. If you need it today, you can get them from PyICU.

>>> unicode( icu.UnicodeString(u'IK').toLower(icu.Locale('TR')) )
u'ık'

Although if all you care about is the Turkish I, you might prefer to just special-case it.

Upvotes: 5

Related Questions