Getting the upper or lower case of a unicode code point (as uint32_t)

Question

Is there a way to get the upper or lower case character for a given unicode code point (or the equivalent utf-8 code unit sequence) ?

I read that this could be done with ICU, but that would be the only thing i'd need ICU for, so i don't want to import a whole huge library (with its licences and dependencies, if any) for a single feature.

I also read that upper and lower case depend on the locale. What does this mean exactly ?

Thanks for your help.

PS: Can't use C++11, using VS2005

Rob Napier · Accepted Answer

ICU is the right tool for this. Case-folding (the idea that multiple symbols represent the same "letter") is a tricky concept in the general form.

What's the uppercase form of i? What country are we in and what language are we writing? English has the pair Ii. Turkish has two pairs: İi and Iı. So it's not so simple, and explains the "locale matters" part of the problem.

Another interesting case is the capital for the German ß (Eszett or "sharp S" in English). Its capital form is two letters, SS. So there's no promise that the uppercase form of a string will even have the same number of letters in it.

It's possible that there's some small library that just focuses on case folding, but I'm not aware of it. Generally to do Unicode reasonably, you have to do a lot of Unicode.

Getting the upper or lower case of a unicode code point (as uint32_t)

Answers (1)

Related Questions