Reputation: 1077
Suppose I have a string that looks like:
"lets refer to [[merp] [that entry called merp]] and maybe also to that entry called [[blue] [blue]]"
The idea here is to replace a block of [[name][some text]]
with <a href="name.html">some text</a>
.
So I'm trying to use regular expressions to find blocks that look like [[name][some text]]
, but I'm having tremendous difficulty.
Here's what I thought should work (in PHP):
preg_match_all('/\[\[.*\]\[.*\]/', $my_big_string, $matches)
But this just returns a single match, the string from '[[merp'
to 'blue]]'
. How can I get it to return the two matches [[merp][that entry called merp]]
and [[blue][blue]]
?
Upvotes: 2
Views: 153
Reputation: 12389
Quantifiers like the *
are by default greedy,
which means, that as much as possible is matched to meet conditions. E.g. in your sample a regex like \[.*\]
would match everything from the first [
to the last ]
in the string. To change the default behaviour and make quantifiers lazy (ungreedy, reluctant):
U (PCRE_UNGREEDY)
modifier to make all quantifiers lazy?
after a specific quantifier. E.g. .*?
as few of any characters as possible1.) Using the U-modifier a pattern could look like:
/\[\[(.*)]\s*\[(.*)]]/Us
Additional used the s (PCRE_DOTALL) modifier to make the .
dot also match newlines. And added some \s
whitespaces in between ][
which are in your sample string. \s
is a shorthand for [ \t\r\n\f]
.
There are two capturing groups (.*)
to be replaced then. Test on regex101.com
2.) Instead using the ?
to making each quantifier lazy:
/\[\[(.*?)]\s*\[(.*?)]]/s
3.) Alternative without modifiers, if no square brackets are expected to be inside [...]
.
/\[\[([^]]*)]\s*\[([^]]*)]]/
Using a ^
negated character class to allow [^]]*
any amount of characters, that are NOT ]
in between [
and ]
. This wouldn't require to rely on greediness. Also no .
is used, so no s-modifier is needed.
Replacement for all 3 examples according to your sample: <a href="\1">\2</a>
where \1
correspond matches of the first parenthesized group,...
Upvotes: 2
Reputation: 4526
The regex you're looking for is \[\[(.+?)\]\s\[(.+?)\]\]
and replace it with <a href="$1">$2</a>
The regex pattern matched inside the ()
braces are captured and can be back-referenced using $1, $2,...
Example on regex101.com
Upvotes: 4