Chas. Owens
Chas. Owens

Reputation: 64919

How will Perl 6 handle the new combining emoji length?

Some emoji now combine. For instance, U+1f441 (👁) U+200d (ZWJ) U+1f5e8 (🗨) combine to make 👁‍🗨 (I am a witness). Rakudo 2016.07.1 on MoarVM 2016.07 says there are two graphemes:

> "\x[1f441]\x[200d]\x[1f5e8]".chars
2

I think that should be 1. It seems to have a similar problem with

> "\x[1f441]\x[fe0f]\x[200d]\x[1f5e8]\x[fe0f]".chars
2

But at least it handles U+fe0f (VS-16, emoji representation) correctly.

Are there plans to fix this in a later version of Perl 6 or am I misunderstanding the intent of the chars method?

Upvotes: 3

Views: 173

Answers (1)

nwellnhof
nwellnhof

Reputation: 33618

The ZWJ sequence you mentioned is only part of Unicode Emoji 4.0 which is still in draft status and planned for release in November 2016. Under this new version, U+1F5E8 has the Grapheme_Cluster_Break property E_Base_GAZ (EBG), so the sequence should indeed form a single grapheme cluster.

I'm sure that Perl 6 will catch up at some point.

Upvotes: 2

Related Questions