KeatsKelleher
KeatsKelleher

Reputation: 10191

preg_replace throws seg fault

When I execute the following code; I get a seg fault every time! Is this a known bug? How can I make this code work?

<?php
$doc = file_get_contents("http://prairieprogressive.com/");
$replace = array(
    "/<script([\s\S])*?<\/ ?script>/",
    "/<style([\s\S])*?<\/ ?style>/",
    "/<!--([\s\S])*?-->/",
    "/\r\n/"
);
$doc = preg_replace($replace,"",$doc);
echo $doc;
?>

The error (obviously) looks like:

[root@localhost 2.0]# php test.php
Segmentation fault (core dumped)

Upvotes: 5

Views: 1253

Answers (5)

bcosca
bcosca

Reputation: 17555

You have unnecessary capture groups that strain PCRE's backtracking. Try this:

$replace = array(
    "/<script.*?><\/\s?script>/s",
    "/<style.*?><\/\s?style>/s",
    "/<!--.*?-->/s",
    "/\r\n/s"
);

Another thing, \s (whitespace) combined with \S (non-whitespace) matches anything. So just use the . pattern.

Upvotes: 2

netcoder
netcoder

Reputation: 67745

What is the point of [\s\S]? It matches any whitespace character, and any non-whitespace character. If you replace it with .*, it works just fine.

EDIT: If you want to match new lines too, use the s modifier. In my opinion, it is easier to understand than a contradictory [\s\S].

Upvotes: 0

codaddict
codaddict

Reputation: 455360

This seems to be a bug.

As mentioned by you in the comment, it is the style regex that is causing this. As a workaround you can use the s modifier so that . matches even the newline:

$doc = preg_replace("/<style.*?<\/ ?style>/s",'',$doc);

Upvotes: 1

Jens Bradler
Jens Bradler

Reputation: 1

Try this (added option u for unicode and changed ([\s\S])? to .? :

<?php
$doc = file_get_contents("http://prairieprogressive.com/");
$replace = array(
    "#<script.*?</ ?script>#u",
    '#<style.*?</ ?style>#u',
    "#<!--.*?-->#u",
    "#\r\n#u"
);
$doc = preg_replace($replace,"",$doc);
echo $doc;
?>

Upvotes: 0

KeatsKelleher
KeatsKelleher

Reputation: 10191

OK! It seems like there is some issue with the () operators...

When I use

$doc = preg_replace("/<style([\s\S]*)<\/ ?style>/",'',$doc);

instead of

$doc = preg_replace("/<style([\s\S])*<\/ ?style>/",'',$doc);

it works!!

Upvotes: 1

Related Questions