Lou
Lou

Reputation: 2509

PHP Regex replace is not producing the desired result

I'm creating a dictionary application in PHP and MariaDB, and trying to simulate some basic markdown. When I have a definition like this:

This is an example definition. Here is a link to [foo]. This is an [aliased link|bar].

Then [foo] will be translated into a link to the 'foo' definition page, and [aliased link|bar] will translate into a link to the 'bar' definition page. If there's a pipe then whatever's before the pipe (|) will become the link text, and after the pipe becomes the link destination. If there's no pipe, then the expression in brackets becomes the link text and destination.

So I would translate this to the following HTML:

This is an example definition. Here is a link to <a href="foo">foo</a>. This is an <a href="bar">aliased link</a>.

The easiest way I could think of to do this was through two regex replaces. So let's say my example string is called $def, here is the code I've tried to make these replacements:

$pattern1 = '/\[(.*?)?\]/m';
$replace1 = '<a href="$1">$1</a>';
$def = preg_replace($pattern1, $replace1, $def);

$pattern2 = '/\[([^]]*?)(?:\|([^]]*?))\]/m';
$replace2 = '<a href="$2">$1</a>';
$def = preg_replace($pattern2, $replace2, $def);

(I assumed it would be easier to do it using two regexes, but if there's a simpler one-regex solution I'd love to know.)

However, I've clearly got something wrong with the regex, as this is what happens when I echo $def (the links are just illustrative for now, the destination isn't important):

This is an example definition. Here is a link to foo. This is an aliased link|bar.

And the HTML:

"This is an example definition. Here is a link to "
<a href="foo">foo</a>
". This is an" 
<a href="aliased link|bar">aliased link|bar</a>
"."

Can anyone advise what I need to do to fix the regex to get my desired result? I'm especially confused because when I test this regex in www.regex101.com, it seems to do exactly what I think it should do:

enter image description here

I'm using PHP 7.4.6 on Google Chrome, with XAMPP and Apache.

Upvotes: 1

Views: 39

Answers (1)

The fourth bird
The fourth bird

Reputation: 163207

Note that in the pattern that you used, you can exclude matching the | by adding it in the first negated character class to prevent some backtracking. The quantifier for the negated character class also does not have to be non greedy *? as the ] can not be crossed at the end.

You could use 2 capture groups where the second group is in an optional part and check for the presence of group 2 using preg_replace_callback.

\[([^][|]+)(?:\|([^][]+))?]

The pattern matches:

  • \[ Match [
  • ([^][|]+) Capture group 1, match 1+ times any char except [ ] and |
  • (?:\|([^][]+))? Optional non capture group matching | and capture any char except the listed in group 2
  • ] Match closing ]

Regex demo | Php demo

$pattern = "/\[([^][|]+)(?:\|([^][]+))?\]/";
$s = "This is an example definition. Here is a link to [foo]. This is an [aliased link|bar].";
$s = preg_replace_callback($pattern, function($match){
    $template = '<a href="%s">%s</a>';
    return sprintf($template, array_key_exists(2, $match) ? $match[2] : $match[1], $match[1]);
}, $s);

echo $s;

Output

This is an example definition. Here is a link to <a href="foo">foo</a>. This is an <a href="bar">aliased link</a>.

Upvotes: 2

Related Questions