Orinoco
Orinoco

Reputation: 177

Different results between preg_replace & preg_match_all

I have a forum that supports hashtags. I'm using the following line to convert all hashtags into links. I'm using the (^|\(|\s|>) pattern to avoid picking up named anchors in URLs.

$str=preg_replace("/(^|\(|\s|>)(#(\w+))/","$1<a href=\"/smalltalk.php?Tag=$3&amp;".SID."\">$2</a>",$str);

I'm using this line to pick up hashtags to store them in a separate field when the user posts their message, this picks up all hashtags EXCEPT those at the start of a new line.

preg_match_all("/(^|\(|\s|>)(#(\w+))/",$Content,$Matches);

Using the m & s modifiers doesn't make any difference. What am I doing wrong in the second instance?

Edit: the input text could be plain text or HTML. Example of problem input:

#startoftextreplacesandmatches #afterwhitespacereplacesandmatches <b>#insidehtmltagreplacesandmatches</b> :)
#startofnewlinereplacesbutdoesnotmatch :(

Upvotes: 2

Views: 749

Answers (1)

DaveRandom
DaveRandom

Reputation: 88677

Your replace operation has a problem which you have evidently not yet come across - it will allow unescaped HTML special characters through. The reason I know this is because your regex allows hashtags to be prefixed with >, which is a special character.

For that reason, I recommend you use this code to do the replacement, which will double up as the code for extracting the tags to be inserted into the database:

$hashtags = array();

$expr = '/(?:(?:(^|[(>\s])#(\w+))|(?P<notag>.+?))/';

$str = preg_replace_callback($expr, function($matches) use (&$hashtags) {
    if (!empty($matches['notag'])) {
        // This takes care of HTML special characters outside hashtags
        return htmlspecialchars($matches['notag']);
    } else {
        // Handle hashtags
        $hashtags[] = $matches[2];
        return htmlspecialchars($matches[1]).'<a href="/smalltalk.php?Tag='.htmlspecialchars(urlencode($matches[2])).'&amp;'.SID.'">#'.htmlspecialchars($matches[2]).'</a>';
    }
}, $str);

After the above code has been run, $str will contain the modified string, properly escaped for direct output, and $hashtags will be populated with all the tags matched.

See it working

Upvotes: 2

Related Questions