Reputation: 1097
I am trying to strip away all non-allowed characters from a string using regex. Here is my current php code
$input = "👮";
$pattern = "[a-zA-Z0-9_ !@#$%^&*();\\\/|<>\"'+\-.,:?=]";
$message = preg_replace($pattern,"",$input);
if (empty($message)) {
echo "The string is empty";
}
else {
echo $message;
}
The emoji gets printed out when I run this when I want it to print out "The string is empty.".
When I put my regex code into http://regexr.com/ it shows that the emoji is not matching, but when I run the code it gets printed out. Any suggestions?
Upvotes: 4
Views: 4729
Reputation: 1128
This pattern should do the trick :
$filteredString = preg_replace('/([^-\p{L}\x00-\x7F]+)/u', '', $rawString);
Some sequences are quite rare, so let's explain them:
\p{L}
matches any kind of letter from any language\x00-\x7F
a single character in the range between (index 0) and (index 127) (case sensitive)u
modifier who turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8.Upvotes: 8
Reputation: 76656
Your pattern is incorrect. If you want to strip away all the characters that are not in the list provided, then you have to use a negating character class: [^...]
. Also, currently, [
and ]
are being used as delimiters, which means, the pattern isn't seen as a character class.
The pattern should be:
$pattern = "~[^a-zA-Z0-9_ !@#$%^&*();\\\/|<>\"'+.,:?=-]~";
This should now strip away the emoji and print your message.
Upvotes: 4