Sarah
Sarah

Reputation: 2003

Can you help me rewrite my javascript regex using php preg_replace?

I have created a javascript regular expression in order to validate comments entered by users in my app. The regex allows letters, numbers some special symbols and a range of emojis

I received help here to correctly format my javascript regular expression and the final expression I am using is as follows:

Javascript Regex:

commentRegex =    /^(?:[A-Za-z0-9\u00C0-\u017F\u20AC\u2122\u2150\u00A9 \/.,\-_$!\'&*()="?#+%:;\<\[\]\r\r\n]|(?:\ud83c[\udf00-\udfff])|(?:\ud83d[\udc00-\ude4f\ude80-\udeff]))*$/;

I was advised to perform the same validation on the server side (with php) and so I am trying to perform a similar process using preg_replace().

So I would like to replace all characters (that are not contained in the regex), with the empty string. Here is my attempt however it is not working. thanks for any help

PHP

$commentText = preg_replace('#^(?:[A-Za-z0-9\u00C0-\u017F\u20AC\u2122\u2150\u00A9 \/.,\-_$!\'&*()="?#+%:;\<\[\]\r\r\n]|(?:\ud83c[\udf00-\udfff])|(?:\ud83d[\udc00-\ude4f\ude80-\udeff]))*$#', '', $commentText);

Edit:

After taking your advice in the comments I now have the following regex.

$postText = preg_replace('/^(?:[A-Za-z0-9\x{00C0}-\x{017F}\x{20AC}\x{2122}\x{2150}\x{00A9} \/.,\-_$!\'&*()="?\#\+%:;\<\[\]\r\n]|(?:\x{d83c}[\x{df00}-\x{dfff}])|(?:\x{d83d}[\x{dc00}-\x{de4f}\x{de80}-\x{deff}]))*$/', '', $postText);

However I am getting a warning

<b>Warning</b>:  preg_replace(): Compilation failed: character value in \x{} or \o{} is too large at offset 30 in <b>submit_post.php</b> on line <b>37

Upvotes: 0

Views: 306

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

In short: use

$re = '/[^A-Za-z0-9\x{00C0}-\x{017F}\x{20AC}\x{2122}\x{2150}\x{00A9} \/.,\-_$!\'&*()="?#+%:;<[\]\r\n\x{1F300}-\x{1F3FF}\x{1F400}-\x{1F64F}\x{1F680}-\x{1F6FF}]+/u';
$text = 'test>><<<®¥§';
echo preg_replace($re, '', $text);

See the PHP demo.

A bit of an explanation:

  • Escape only special regex metacharacters inside the pattern AND the regex delimiters (if you choose a # as a regex delimiter, escape the # in the pattern, and then there is no need to escape /)
  • \uXXXX in PCRE must be replaced with \x{XXXX} notation
  • Since the text to be processed is Unicode and the chars you have in your pattern are out of the ASCII range, you have to use /u UNICODE modifier
  • As most emojis come outside the BMP plane, and the string now treated as a chain of Unicode code points, these symbols must be written using the extended \x notation, not as two byte notation used in JavaScript
  • Your 3 alternatives can be merged into 1 big character class and then you want to negated it by adding ^ at its start to make it a negated character class.

Upvotes: 1

Scott Weaver
Scott Weaver

Reputation: 7351

convert the \u.... sequences to \x{....}, and the result appears to be a valid PHP regular expression.

pattern: \\u(\w{4})

replace: \\x{$1}

regex101 demo

Upvotes: 1

Fabian N.
Fabian N.

Reputation: 1240

The regex in PHP has a character, which sourrounds the regex. In your case you are using the hash (#), but the character should not occour in the regex itslef, which it does...

You have to excape this character inside, or use another char. Why did you not use the same "/" as in the JS Version? The benefit is, it is already escaped.

I have not looked, if the rest would work, but I think so.

$commentText = preg_replace('/^(?:[A-Za-z0-9\u00C0-\u017F\u20AC\u2122\u2150\u00A9 \/.,\-_$!\'&*()="?#+%:;\<\[\]\r\r\n]|(?:\ud83c[\udf00-\udfff])|(?:\ud83d[\udc00-\ude4f\ude80-\udeff]))*$/', '', $commentText);

should work.

Upvotes: 1

Related Questions