Reputation: 797
I use this preg replace to remove all emojis in a string:
$data['message'] = preg_replace("/([0-9|#][\x{20E3}])|[\x{00ae}|\x{00a9}|\x{203C}|\x{2047}|\x{2048}|\x{2049}|\x{3030}|\x{303D}|\x{2139}|\x{2122}|\x{3297}|\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F6FF}][\x{FE00}-\x{FEFF}]?/u", "", $data['message']);
This works very well but I dont want to remove them. Instead I have to replace them with a bbcode. That means every emojis in a string should replaced with a own bbcode.
Example:
U+1F600
becomes
[emoji]1f600[/emoji]
or
U+1F603
becomes
[emoji]1f603[/emoji]
Is this possible? Thank you very much.
Upvotes: 1
Views: 2544
Reputation: 4244
I had a try with the pattern you are using and it seems that it doesn't match all the existing unicode emojis.
In fact, I find it quite complicated to understand completely how emojis are built, especially because they are made of one code point and then they can be modified with the use of sequences. Typically, you can have a variation of an emoji to change the skin color of a face. Another example of the family icon of two women and two girls โ๐ฉโ๐ฉโ๐งโ๐ง
you build it with this sequence: ๐ฉโ + U+200D + ๐ฉโ + U+200D + ๐งโ + U+200D + ๐ง
I found the interesting explanations here: https://www.contentful.com/blog/2016/12/06/unicode-javascript-and-the-emoji-family/
But to come back to your question, your preg_replace()
seems to be a
quick, almost working solution. I first wrote another pattern which
seems to include more cases:
$data['message'] = preg_replace(
'/[\x{1F600}-\x{1F64F}\x{2700}-\x{27BF}\x{1F680}-\x{1F6FF}\x{24C2}-\x{1F251}\x{1F30D}-\x{1F567}\x{1F900}-\x{1F9FF}\x{1F300}-\x{1F5FF}]/u',
'[emoji]$0[/emoji]',
$data['message']
);
Don't ask me how I found the list of ranges. I just googled around and then had a try with regex101 until it worked with a huge bunch of emojis. I saved it here: https://regex101.com/r/bLuezV/2
But, I noticed that my test on regex101 just above isn't working for the sequences (typically the family example above). It looks like it is working because of the highlight, but no, it's not handling the sequences. So we have to find a better regex pattern! This is way too much work to handle all the correct cases and the pattern will be difficult to maintain.
I think it's better to use a specific lib to do what you are looking for. It will certainly be updated when new emojis come out.
The text file containing all the latest sequences:
https://github.com/mathiasbynens/emoji-test-regex-pattern/blob/main/dist/latest/index.txt
This PHP project could help you: https://github.com/aaronpk/emoji-detector-php
Also, consider using any other PHP Composer package (as projects
often change and are replaced by newer ones). Typically, just by
searching for emoji
on packagist.org: https://packagist.org/?query=emoji
v
flagIn JavaScript, since 2023,
the vnicode flag
gives us the possibility to match simple emoji icons, but also
emojis made of multiple code points, with the help of
\p{RGI_Emoji}
.
This makes things quite easier: https://regex101.com/r/bLuezV/6
\p{Emoji}
matches an emoji made of a single code point. But,
warning, it will also match a digit [0-9]
, #
or *
because
they are declared as emoji characters!
Don't ask me why, as I don't find it very logical.\p{RGI_Emoji}
matches an emoji made of multiple code points.Upvotes: 6
Reputation: 364
preg_match should be the function to help you. That way you get an array with all of the Emojis found. Then you can use str_replace to replace all the emojis.
In example:
$foundemoji = preg_match($regex, $data['message'], $found);
for($i = 0; $i < count($found); $i++){
str_replace($found[$i], "[emoji]".$found[$i]."[\emoji]", $data[$message]);
}
Edit: $foundemoji is either 0 or 1. 1 if preg_match found something that matches the regex, 0 if not. The found emoji is in $found.
Upvotes: 0