tijagi
tijagi

Reputation: 1244

Recursive regex doesn’t work

The string I work on looks like that:

abc {def ghi {jkl mno} pqr stv} xy z

And I need to put what figure parentheses are containing in tags, so it should looks like this

abc <tag>def ghi <tag>jkl mno</tag> pqr stv</tag> xy z

I’ve tried

'#(?<!\pL)\{  ( ([^{}]+) | (?R) )*  \}(?!\pL)#xu'

but what I get is just <tag>xy z</tag>. Help please, what am I doing wrong?

Upvotes: 4

Views: 1107

Answers (2)

Martin Ender
Martin Ender

Reputation: 44259

Nested structures are by definition too complicated for regular expressions (yes, PCRE supports recursion, but that does not help for this replacement-problem). There are two possible options for you (using regular expressions anyway). Firstly, you could simply replace opening brackets by opening tags and the same for closing tags. This, however, will convert unmatched brackets as well:

$str = preg_replace('/\{/', '<tag>', $str);
$str = preg_replace('/\}/', '</tag>', $str);

Another option is to only replace matching { and }, but then you have to do it repeatedly, because one call to preg_replace cannot replace multiple nested levels:

do
{
    $str = preg_replace('/\{([^{]*?)\}/', '<tag>$1</tag>', $str, -1, $count);
}
while ($count > 0)

EDIT: While PCRE supports recursion with (?R) this will most likely not help with a replacement. The reason is that, if a capturing group is repeated, its reference will only contain the last capturing (i.e. when matching /(a|b)+/ in aaaab, $1 will contain b). I suppose that this is the same for recursion. That is why you can only replace the innermost match because it's the last match of the capturing group within the recursion. Likewise, you could not try to capture { and } with recursion and replace these, because they might also be matched an arbitrary number of times and only the last match will be replaced.

Just matching a correct nested syntax and then replacing the innermost or outermost matching brackets will not help either (with one preg_replace call), because multiple matches will never overlap (so if 3 nested brackets have been found, the inner 2 brackets themselves will be disregarded for further matches).

Upvotes: 5

Brian White
Brian White

Reputation: 8716

How about two steps:

s!{!<tag>!g;
s!}!</tag>!g;

(perl format; translate to your format as appropriate)

or maybe this:

1 while s!{([^{}]*)}!<tag>$1</tag>!g;

Upvotes: 3

Related Questions