Reputation: 12027
In Turkish, there's a letter İ
which is the uppercase form of i
. When I convert it to lowercase, I get a weird result. For example:
var string_tr = "İ".toLowerCase();
var string_en = "i";
console.log( string_tr == string_en ); // false
console.log( string_tr.split("") ); // ["i", "̇"]
console.log( string_tr.charCodeAt(1) ); // 775
console.log( string_en.charCodeAt(0) ); // 105
"İ".toLowerCase()
returns an extra character, and if I'm not mistaken, it's COMBINING DOT ABOVE (U+0307).
How do I get rid of this character?
I could just filter the string:
var string_tr = "İ".toLowerCase();
string_tr = string_tr.split("").filter(function (item) {
if (item.charCodeAt(0) != 775) {
return true;
}
}).join("");
console.log(string_tr.split(""));
but am I handing this correctly? Is there a more preferable way? Furthermore, why does this extra character appear in the first place?
There's some inconsistency. For example, in Turkish, there a lowercase form of I
: ı
. How come the following comparison returns true
console.log( "ı".toUpperCase() == "i".toUpperCase() ) // true
while
console.log( "İ".toLowerCase() == "i" ) // false
returns false?
Upvotes: 20
Views: 5689
Reputation: 1
You can just use the LocalLowerCase or LocalUpperCase for languages like Turkish and other alphabets with dotted and dotless i versions such as Azerbaijani, Kazakh, Tatar, and Crimean Tatar.
var string_tr = "İ".toLocalLowerCase();
var string_en = "i";
console.log( string_tr == string_en ); // false
console.log( string_tr.split("") ); // ["i", "̇"]
console.log( string_tr.charCodeAt(1) ); // 775
console.log( string_en.charCodeAt(0) ); // 105
Upvotes: 0
Reputation: 224913
You’ll need a Turkish-specific case conversion, available with String#toLocaleLowerCase
:
let s = "İ";
console.log(s.toLowerCase().length);
console.log(s.toLocaleLowerCase('tr-TR').length);
Upvotes: 33