Reputation: 18727
Python string and unicode objects have following methods for string case conversion.
upper()
lower()
title()
Using unicode strings, I can handle nearly all characters in my local alphabet:
test_str = u"ças şak ürt örkl"
print test_str.upper()
>> ÇAS ŞAK ÜRT ÖRKL
Except two letters. Since I am living in Turkey, I have typical Turkish I problem
.
In my local alphabet, we have a letter İ
which is similar to I
and their case conversion must be like following
I → lowercase → ı
i → uppercase → İ
And yes, it spoils ASCII conversion of i --> I
since i
and I
are two separate letters.
test_str = u"ik"
print test_str.upper()
>> IK # Wrong! must be İK
test_str = u"IK"
print test_str.lower()
>> ik # Wrong! must be ık
How can I overcome this? Is there a way to handle case conversions correctly with using python build-ins?
Upvotes: 6
Views: 870
Reputation: 536429
Python currently doesn't have any support for locale-specific case folding, or the other rules in Unicode SpecialCasing.txt. If you need it today, you can get them from PyICU.
>>> unicode( icu.UnicodeString(u'IK').toLower(icu.Locale('TR')) )
u'ık'
Although if all you care about is the Turkish I, you might prefer to just special-case it.
Upvotes: 5