Reputation: 167
How would I remove all of a an html input but comments? For example:
This <html><body><!-- hello paragraph --><p>hello</p></body></html>
Would turn into this: This <!-- hello paragraph -->
How would I do this? Thanks!
Edit: I know you can do stuff like this with regular expressions, But I don't know how.
Upvotes: 0
Views: 403
Reputation: 15087
Instead of replacing HTML, I'd extract all comments using:
preg_match_all('#(<!--.*?-->)#s', '<html><body><!-- hello paragraph --><p>hello</p></body></html>', $m);
Upvotes: 1
Reputation:
$foo="<html><body><!-- hello paragraph --><p>hello</p></body></html>";
preg_match('/(\<|<)!--(\s*.*?\s*)--(\>|>)/m',$foo,$result);
print_r($result);
Upvotes: 0
Reputation: 145482
That's indeed a bit more complex, but doable with regular expressions:
$text = preg_replace('~<(?!!--)/?\w[^>]*(?<!--)>~', "", $text);
This works on your example, but can fail for others. Amusingly it also removes HTML tags from within comments.
$regex = '~
< # opening html bracket
(?!!--) # negative assertion, no "!--" may follow
/?\w # tags must start with letter or optional /
[^>]* # matches html tag innards
(?<!--) # lookbehind assertion, no "--" before closing >
> # closing bracket
~x'
Upvotes: 0