Reputation: 4527
I want to validate the name when a new user signs up at my page. One of those checks is if the character limit isn't above 100.
But since one single emoji like 👩❤️💋👩 (those are actually 4 emoji together? see screenshot) count much more than 1 character I have issues to validate the name. I want to allow emoji in the name, because these days it's quite common to have a heart, star or something similar there, but I don't want to allow names with more than 100 characters.
So I have this question:
PS: I'm talking about a php solution, but I would alternatively accept Javascript too, even if I don't prefer it.
Edit: My example emoji seems to be this string: \ud83d\udc69\u200d\u2764\ufe0f\u200d\ud83d\udc8b\u200d\ud83d\udc69
Please notice the mentioned screenshot of this question:
Upvotes: 19
Views: 12130
Reputation: 614
As a potential javascript solution (if you don't mind adding a library), Lodash has tackled this problem in their toArray module.
For example,
_.toArray('12👪').length; // --> 3
Or, if you want to knock a few arbitrary characters off a string, you manipulate and rejoin the array, like:
_.toArray("👪trimToEightGlyphs").splice(0,8).join(''); // --> '👪trimToE'
Upvotes: 16
Reputation: 36989
Unicode defines abstract characters as code points, but what allows for rendering it on screen is the font. A font is a collection of graphical shapes, called glyphs, and they are the visual representation of a code point or a sequence of code points. A sequence of one or more code points that are displayed as a single graphical unit is called grapheme.
If you need to get the length in grapheme units (and NOT characters, like mb_strlen
would do), you can use grapheme_strlen
:
$emoji = "\u{1F469}\u{200D}\u{2764}\u{FE0F}\u{200D}\u{1F48B}\u{200D}\u{1F469}";
echo $emoji , " : " , strlen($emoji) , "\n"; // 27, count bytes
echo $emoji , " : " , mb_strlen($emoji) , "\n"; // 8, count characters
echo $emoji , " : " , grapheme_strlen($emoji) , "\n"; // 1, count grapheme units
Upvotes: 10