Gal
Gal

Reputation: 23662

PHP - help with my REGEX-based recursive function

I'm extracting a string from wikipedia API that initially looks like this: link text. I want to peel off all {{...}} and everything in between them (could be any kind of text). For that I thought about using a recursive function with "preg_match","preg_replace". something like:

function drop_brax($text)
{
    if(preg_match('/{{(.)*}}/',$text)) 
    return drop_brax(preg_replace('/{{(.)*}}/','',$text));
    return $text;
}

This function will not work because of a situation like this:

{{ I like mocachino {{ but I also like banana}} and frutis }}

this will peel off everything between the first occurence of both {{ and }} (and leave out "and frutis }}"). How can I do this properly? (while maintaining the nice recursive form).

Upvotes: 2

Views: 712

Answers (2)

antpaw
antpaw

Reputation: 15985

to have this fully recursive you will need a parser:

function drop_brax($str)
{
    $buffer = NULL;
    $depth = 0;
    $strlen_str = strlen($str);
    for($i = 0; $i < $strlen_str; $i++)
    {
        $char = $str[$i];

        switch ($char)
        {
            case '{':
                $depth++;
            break;
            case '}':
                $depth--;
            break;
            default:
                $buffer .= ($depth === 0) ? $char : NULL;
        }
    }
    return $buffer;
}

$str = 'some text {{ I like mocachino {{ but I also like banana}} and frutis }} some text';
$str = drop_brax($str);
echo $str;

output:

some text some text

Upvotes: 0

Bart Kiers
Bart Kiers

Reputation: 170148

Try something like this:

$text = '...{{aa{{bb}}cc}}...{{aa{{bb{{cc}}bb{{cc}}bb}}dd}}...';
preg_match_all('/\{\{(?:[^{}]|(?R))*}}/', $text, $matches);
print_r($matches);

output:

Array
(
    [0] => Array
        (
            [0] => {{aa{{bb}}cc}}
            [1] => {{aa{{bb{{cc}}bb{{cc}}bb}}dd}}
        )
)

And a short explanation:

\{\{      # match two opening brackets
(?:       # start non-capturing group 1
  [^{}]   #   match any character except '{' and '}'
  |       #   OR
  (?R)    #   recursively call the entire pattern: \{\{(?:[^{}]|(?R))*}}
)         # end non-capturing group 1
*         # repeat non-capturing group 1 zero or more times
}}        # match two closing brackets

Upvotes: 6

Related Questions