Reputation: 501
I have a custom markup parsing function that has been working very well for many years. I recently discovered a bug that I hadn't noticed before and I haven't been able to fix it. If anyone can help me with this that'd be awesome. So I have a custom built forum and text based MMORPG and every input is sanitized and parsed for bbcode like markup. It'll also parse out URL's and make them into legit links that go to an exit page with a disclaimer that you're leaving the site... So the issue that I'm having is that when I user posts multiple URL's in a text box (let's say \n delimited) it'll only convert every other URL into a link. Here's the parser for URL's:
$markup = preg_replace("/(^|[^=\"\/])\b((\w+:\/\/|www\.)[^\s<]+)" . "((\W+|\b)([\s<]|$))/ei", '"$1<a href=\"out.php?".shortURL("$2")."\" target=\"_blank\">".shortURL("$2")."</a>$4"', $markup);
As you can see it calls a PHP function, but that's not the issue here. Then entire text block is passed into this preg_replace at the same time rather than line by line or any other means.
Example INPUT:
http://skylnk.co/tRRTnb
http://skylnk.co/hkIJBT
http://skylnk.co/vUMGQo
http://skylnk.co/USOLfW
http://skylnk.co/BPlaJl
http://skylnk.co/tqcPbL
http://skylnk.co/jJTjRs
http://skylnk.co/itmhJs
http://skylnk.co/llUBAR
http://skylnk.co/XDJZxD
Example OUTPUT:
<a href="out.php?http://skylnk.co/tRRTnb" target="_blank">http://skylnk.co/tRRTnb</a>
<br>http://skylnk.co/hkIJBT
<br><a href="out.php?http://skylnk.co/vUMGQo" target="_blank">http://skylnk.co/vUMGQo</a>
<br>http://skylnk.co/USOLfW
<br><a href="out.php?http://skylnk.co/BPlaJl" target="_blank">http://skylnk.co/BPlaJl</a>
<br>http://skylnk.co/tqcPbL
<br><a href="out.php?http://skylnk.co/jJTjRs" target="_blank">http://skylnk.co/jJTjRs</a>
<br>http://skylnk.co/itmhJs
<br><a href="out.php?http://skylnk.co/llUBAR" target="_blank">http://skylnk.co/llUBAR</a>
<br>http://skylnk.co/XDJZxD
<br>
Upvotes: 0
Views: 400
Reputation: 56809
e
flag in preg_replace
is deprecated. You can use preg_replace_callback
to access the same functionality.
i
flag is useless here, since \w
already matches both upper case and lower case, and there is no backreference in your pattern.
I set m
flag, which makes the ^
and $
matches the beginning and the end of a line, rather than the beginning and the end of the entire string. This should fix your weird problem of matching every other line.
I also make some of the groups non-capturing (?:pattern)
- since the bigger capturing groups have captured the text already.
The code below is not tested. I only tested the regex on regex tester.
preg_replace_callback(
"/(^|[^=\"\/])\b((?:\w+:\/\/|www\.)[^\s<]+)((?:\W+|\b)(?:[\s<]|$))/m",
function ($m) {
return "$m[1]<a href=\"out.php?".shortURL($m[2])."\" target=\"_blank\">".shortURL($m[2])."</a>$m[3]";
},
$markup
);
Upvotes: 1