Reputation: 303
How are you? I'll get straight to the point.
I'm using a recursive regular expression that basically removes individual or nested <blockquote> tags. I only need to remove plain <blockquote> ... </blockquote> text, nested or not, and leave whatever is outside of these.
This regex does the job EXACTLY as I want (note the use of lookahead and recursion)
$comment=preg_replace('#<blockquote>((?!(</?blockquote>)).|(?R))*</blockquote>#s',"",$comment);
but it has a big problem: when the $comment is large (more than 3500 characters long), apache crashes (I assume segmentation fault).
I need a solution to the problem, either but solving the crash, using a better regexp or a custom function that will do the job as well.
If you simply have ideas on how to remove nested specific tags, they are kindly welcome.
Thank you in advance
Upvotes: 0
Views: 631
Reputation: 7646
Man, your pattern sigfaults like crazy! Even comment of several hundred bytes ends with a crash.
It's a lot simpler to use preg_split() to split up the string, then use a counter to keep track of how deep you are. And when the depth is greater than one, you throw away the text. Here's the implementation:
$tokens = preg_split('#(</?blockquote.*?>)#s', $comment, -1, PREG_SPLIT_DELIM_CAPTURE);
$outsideTokens = array();
$depth = 0;
for($token = reset($tokens); $token !== false; $token = next($tokens)) {
if($depth == 0) {
$outsideTokens[] = $token;
}
$delimiter = next($tokens);
if($delimiter[1] == '/') {
$depth--;
} else {
$depth++;
}
}
$comment = implode($outsideTokens);
The code should work even when the start tag contains attributes.
Upvotes: 1