Leticia Meyer
Leticia Meyer

Reputation: 167

PHP remove all html but comments

How would I remove all of a an html input but comments? For example: This <html><body><!-- hello paragraph --><p>hello</p></body></html> Would turn into this: This <!-- hello paragraph -->

How would I do this? Thanks!

Edit: I know you can do stuff like this with regular expressions, But I don't know how.

Upvotes: 0

Views: 403

Answers (3)

meze
meze

Reputation: 15087

Instead of replacing HTML, I'd extract all comments using:

preg_match_all('#(<!--.*?-->)#s', '<html><body><!-- hello paragraph --><p>hello</p></body></html>', $m);

Upvotes: 1

user557846
user557846

Reputation:

$foo="<html><body><!-- hello paragraph --><p>hello</p></body></html>";
preg_match('/(\<|<)!--(\s*.*?\s*)--(\>|>)/m',$foo,$result);
print_r($result);

Upvotes: 0

mario
mario

Reputation: 145482

That's indeed a bit more complex, but doable with regular expressions:

$text = preg_replace('~<(?!!--)/?\w[^>]*(?<!--)>~', "", $text);

This works on your example, but can fail for others. Amusingly it also removes HTML tags from within comments.

$regex = '~
    <             # opening html bracket
    (?!!--)       # negative assertion, no "!--" may follow
    /?\w          # tags must start with letter or optional /
    [^>]*         # matches html tag innards
    (?<!--)       # lookbehind assertion, no "--" before closing >
    >             # closing bracket
 ~x'

Upvotes: 0

Related Questions