PoPeio
PoPeio

Reputation: 152

Does a reliable way to capitalize Unicode text exist?

I recently had to deal with some complex problems working with Unicode string (using PHP, a language I know pretty well). The mbstring extension was not really working properly and we had huge pains trying to capitalize Unicode letters, which with ASCII text is a trivial problem, already solved in a variety of ways.

If I had to solve this problem with ASCII text, I would probably just take the character, check if it is a letter and then subtract 32 from its ASCII value, for example! But as for now, I could not find anything explaining how the problem of capitalization of Unicode text has been solved: do I need to store a complete associative table to map every lowercase character to its related uppercase version? I suppose (and hope) I will hear a huge NO!

The heart of the question: does any method to correctly convert lowercases into uppercases (and back) exist when operating with Unicode characters? And if this is the case, which strategies are applied?

For this test suppose you do not have any, but really ANY module available: no mbstring, no iconv, nothing. Moreover, for the sake of simplicity suppose to have the problem of recognizing individual characters already solved, our String object has a nextChar() method which can be used to find the next character, independently from its byte-length. Suppose that what you want to do is taking a string, iterate over it with nextChar() and, for each character, capitalize it if possible.

If unclear or in the need of more information simply comment, I will try to answer your doubts, if they are not even bigger than mine at the moment ;)

Upvotes: 0

Views: 400

Answers (1)

manuelbcd
manuelbcd

Reputation: 4537

You can try PortableUTF8 library, written as alternative to mbstring and iconv.

http://pageconfig.com/post/portable-utf8

Another interesting library is Stringy. It works by default with mbstring but if module is not located it will use polyfill package .

https://github.com/danielstjules/Stringy

In order to improve knowledge of the problem it's interesting to read:

What factors make PHP Unicode-incompatible?

I hope it will be useful for you.

Upvotes: 1

Related Questions