Reputation: 180
I was experimenting with multibyte strings and how to handle them. Using the code that you can see here
https://gist.github.com/charlydagos/89f67808e01f97e6de91
I was successful in rotating most strings. However I noticed that the line
$chr = mb_substr($str, $i, 1);
Will not work for flag emojis, since they use more than a single unicode code point.
You can try the following in your own shells:
This gives desired output: $ php string_rotate_mb.php "δ½ ε₯½"
This however $ php string_rotate_mb.php "π¨π"
returns [H][C]
Which is technically correct, it did rotate the string. But really it's single glyph and my desired output is the flag alone (or a sequence of flags, which then becomes even more garbled glyphs, sometimes even turning it into different flags).
How can I, then, reliably determine that I should grab a $length = 1
or a $length = 2
(or a $length = N
) substring using mb_substr
?
For reference, I'm using PHP 7.0.2 (cli) (built: Jan 7 2016 10:40:26) ( NTS )
, ZSH_VERSION = 5.2
, LC_ALL=en_us.utf-8
, and iTerm2: Build 2.9.git.8dff8db518
.
Solution: https://gist.github.com/charlydagos/6755ad994da07a7b4959#file-string_rotate_working-php-L39-L56
Thank you roeland for introducing the concept of Grapheme Clusters. Good info also in the following links
Upvotes: 0
Views: 123
Reputation: 5741
There are a lot more examples where this fails:
Composing characters: compare eΜ and Γͺ (the first one is actually U+0302 and U+0065)
Variants: eg. emoji can have a black/white or color variant ποΈ vs ποΈ. This is done by adding a variant selector after the emoji. similar problem with ethnic variations: ππ» ππΌ ππ½ ππΎ ππΏ. (note: support for this is a bit spotty, but at least Windows 10 supports these variants)
Flags, which consist of two code points.
Fractions using the Fraction dash (U+2044) may be rendered with one glyph as well. Eg. 1β2. Note the difference with 1/2
And so onβ¦
I think what you're looking for is called grapheme clusters. Without library support I think this is pretty difficult to get right.
For recent PHP versions there is the intl
extension. You may loop over the clusters using the grapheme functions.
Upvotes: 1