Dandy
Dandy

Reputation: 303

PHP preg_replace crashes. Only for Regexp Masters

How are you? I'll get straight to the point.

I'm using a recursive regular expression that basically removes individual or nested <blockquote> tags. I only need to remove plain <blockquote> ... </blockquote> text, nested or not, and leave whatever is outside of these.

This regex does the job EXACTLY as I want (note the use of lookahead and recursion)

$comment=preg_replace('#<blockquote>((?!(</?blockquote>)).|(?R))*</blockquote>#s',"",$comment);

but it has a big problem: when the $comment is large (more than 3500 characters long), apache crashes (I assume segmentation fault).

I need a solution to the problem, either but solving the crash, using a better regexp or a custom function that will do the job as well.

If you simply have ideas on how to remove nested specific tags, they are kindly welcome.

Thank you in advance

Upvotes: 0

Views: 631

Answers (1)

cleong
cleong

Reputation: 7646

Man, your pattern sigfaults like crazy! Even comment of several hundred bytes ends with a crash.

It's a lot simpler to use preg_split() to split up the string, then use a counter to keep track of how deep you are. And when the depth is greater than one, you throw away the text. Here's the implementation:

$tokens = preg_split('#(</?blockquote.*?>)#s', $comment, -1, PREG_SPLIT_DELIM_CAPTURE); 
$outsideTokens = array();
$depth = 0;
for($token = reset($tokens); $token !== false; $token = next($tokens)) { 
    if($depth == 0) {
        $outsideTokens[] = $token;
    }
    $delimiter = next($tokens);
    if($delimiter[1] == '/') {
        $depth--;
    } else {
        $depth++;
    }
}
$comment = implode($outsideTokens);

The code should work even when the start tag contains attributes.

Upvotes: 1

Related Questions