Reputation: 64919
Some emoji now combine. For instance, U+1f441 (👁) U+200d (ZWJ) U+1f5e8 (🗨) combine to make 👁🗨 (I am a witness). Rakudo 2016.07.1 on MoarVM 2016.07 says there are two graphemes:
> "\x[1f441]\x[200d]\x[1f5e8]".chars
2
I think that should be 1. It seems to have a similar problem with
> "\x[1f441]\x[fe0f]\x[200d]\x[1f5e8]\x[fe0f]".chars
2
But at least it handles U+fe0f (VS-16, emoji representation) correctly.
Are there plans to fix this in a later version of Perl 6 or am I misunderstanding the intent of the chars
method?
Upvotes: 3
Views: 173
Reputation: 33618
The ZWJ sequence you mentioned is only part of Unicode Emoji 4.0 which is still in draft status and planned for release in November 2016. Under this new version, U+1F5E8 has the Grapheme_Cluster_Break property E_Base_GAZ (EBG), so the sequence should indeed form a single grapheme cluster.
I'm sure that Perl 6 will catch up at some point.
Upvotes: 2