Reputation: 2509
I'm creating a dictionary application in PHP and MariaDB, and trying to simulate some basic markdown. When I have a definition like this:
This is an example definition. Here is a link to [foo]. This is an [aliased link|bar].
Then [foo]
will be translated into a link to the 'foo' definition page, and [aliased link|bar]
will translate into a link to the 'bar' definition page. If there's a pipe then whatever's before the pipe (|) will become the link text, and after the pipe becomes the link destination. If there's no pipe, then the expression in brackets becomes the link text and destination.
So I would translate this to the following HTML:
This is an example definition. Here is a link to <a href="foo">foo</a>. This is an <a href="bar">aliased link</a>.
The easiest way I could think of to do this was through two regex replaces. So let's say my example string is called $def
, here is the code I've tried to make these replacements:
$pattern1 = '/\[(.*?)?\]/m';
$replace1 = '<a href="$1">$1</a>';
$def = preg_replace($pattern1, $replace1, $def);
$pattern2 = '/\[([^]]*?)(?:\|([^]]*?))\]/m';
$replace2 = '<a href="$2">$1</a>';
$def = preg_replace($pattern2, $replace2, $def);
(I assumed it would be easier to do it using two regexes, but if there's a simpler one-regex solution I'd love to know.)
However, I've clearly got something wrong with the regex, as this is what happens when I echo $def
(the links are just illustrative for now, the destination isn't important):
This is an example definition. Here is a link to foo. This is an aliased link|bar.
And the HTML:
"This is an example definition. Here is a link to "
<a href="foo">foo</a>
". This is an"
<a href="aliased link|bar">aliased link|bar</a>
"."
Can anyone advise what I need to do to fix the regex to get my desired result? I'm especially confused because when I test this regex in www.regex101.com, it seems to do exactly what I think it should do:
I'm using PHP 7.4.6 on Google Chrome, with XAMPP and Apache.
Upvotes: 1
Views: 39
Reputation: 163207
Note that in the pattern that you used, you can exclude matching the |
by adding it in the first negated character class to prevent some backtracking. The quantifier for the negated character class also does not have to be non greedy *?
as the ]
can not be crossed at the end.
You could use 2 capture groups where the second group is in an optional part and check for the presence of group 2 using preg_replace_callback.
\[([^][|]+)(?:\|([^][]+))?]
The pattern matches:
\[
Match [
([^][|]+)
Capture group 1, match 1+ times any char except [
]
and |
(?:\|([^][]+))?
Optional non capture group matching |
and capture any char except the listed in group 2]
Match closing ]
$pattern = "/\[([^][|]+)(?:\|([^][]+))?\]/";
$s = "This is an example definition. Here is a link to [foo]. This is an [aliased link|bar].";
$s = preg_replace_callback($pattern, function($match){
$template = '<a href="%s">%s</a>';
return sprintf($template, array_key_exists(2, $match) ? $match[2] : $match[1], $match[1]);
}, $s);
echo $s;
Output
This is an example definition. Here is a link to <a href="foo">foo</a>. This is an <a href="bar">aliased link</a>.
Upvotes: 2