Reputation: 6668
I want to clean the comments and some other garbage or tags from the <body>
section in HTML using PHP and regex but my code not work:
$str=preg_replace_callback('/<body>(.*?)<\/body>/s',
function($matches){
return '<body>'.preg_replace(array(
'/<!--(.|\s)*?-->/',
),
array(
'',
), $matches[1]).'</body>';
}, $str);
The problem is that nothing happens. Comments will remain where they are or any cleaning to do, nothing happens. Can you help? Thanks!
EDIT:
Thanks to @mhall I figureout that my regex not work becouse of attributes in <body>
tag. I use his code and update this:
$str = preg_replace_callback('/(?=<body(.*?)>)(.*?)(?<=<\/body>)/s',
function($matches) {
return preg_replace('/<!--.*?-->/s', '', $matches[2]);
}, $str);
This work PERFECT!
Thanks people!
Upvotes: 2
Views: 301
Reputation: 50190
Aren't you making it too complicated? You don't need to jump in and out via a callback, since preg_replace
will make replacements at every match:
$parts = explode("<body", $str, 2);
$clean = preg_replace('/<!--.*?-->/s', '', $parts[1]);
$str = $parts[0]."<body".$clean;
Splitting the string into head and body excludes the head from substitution without a lot of messy regexps.
Note the s
after the pattern: '/.../s'
. This makes the dot in the regexp match embedded newlines along with other characters.
Upvotes: 0
Reputation: 3701
Try this. Made a modification on the preg_replace_callback
not to include the body
tags and replaced (.|\s)
with a .*
in preg_replace
. Also dropped the array
syntax from that and added a /s
modifier:
$str = <<<EOS
<html>
<body>
<p>
Here is some <!-- One comment --> text
with a few <!--
Another comment
-->
Comments in it
</p>
</body>
</html>
EOS;
$str = preg_replace_callback('/(?=<body>)(.*?)(?<=<\/body>)/s',
function($matches) {
return preg_replace('/<!--.*?-->/s', '', $matches[1]);
}, $str);
echo $str, PHP_EOL;
Output:
<html>
<body>
<p>
Here is some text
with a few
Comments in it
</p>
</body>
</html>
Upvotes: 2