Reputation: 23662
I'm extracting a string from wikipedia API that initially looks like this:
link text. I want to peel off all {{...}} and everything in between them (could be any kind of text). For that I thought about using a recursive function with "preg_match
","preg_replace
".
something like:
function drop_brax($text)
{
if(preg_match('/{{(.)*}}/',$text))
return drop_brax(preg_replace('/{{(.)*}}/','',$text));
return $text;
}
This function will not work because of a situation like this:
{{ I like mocachino {{ but I also like banana}} and frutis }}
this will peel off everything between the first occurence of both {{ and }} (and leave out "and frutis }}"). How can I do this properly? (while maintaining the nice recursive form).
Upvotes: 2
Views: 712
Reputation: 15985
to have this fully recursive you will need a parser:
function drop_brax($str)
{
$buffer = NULL;
$depth = 0;
$strlen_str = strlen($str);
for($i = 0; $i < $strlen_str; $i++)
{
$char = $str[$i];
switch ($char)
{
case '{':
$depth++;
break;
case '}':
$depth--;
break;
default:
$buffer .= ($depth === 0) ? $char : NULL;
}
}
return $buffer;
}
$str = 'some text {{ I like mocachino {{ but I also like banana}} and frutis }} some text';
$str = drop_brax($str);
echo $str;
output:
some text some text
Upvotes: 0
Reputation: 170148
Try something like this:
$text = '...{{aa{{bb}}cc}}...{{aa{{bb{{cc}}bb{{cc}}bb}}dd}}...';
preg_match_all('/\{\{(?:[^{}]|(?R))*}}/', $text, $matches);
print_r($matches);
output:
Array
(
[0] => Array
(
[0] => {{aa{{bb}}cc}}
[1] => {{aa{{bb{{cc}}bb{{cc}}bb}}dd}}
)
)
And a short explanation:
\{\{ # match two opening brackets
(?: # start non-capturing group 1
[^{}] # match any character except '{' and '}'
| # OR
(?R) # recursively call the entire pattern: \{\{(?:[^{}]|(?R))*}}
) # end non-capturing group 1
* # repeat non-capturing group 1 zero or more times
}} # match two closing brackets
Upvotes: 6