Reputation: 1311
Looking for a regexp sequence of matches and replaces (preferably PHP but doesn't matter) to change this (the start and end is just random text that needs to be preserved).
IN:
fkdshfks khh fdsfsk
<!--g1-->
<div class='codetop'>CODE: AutoIt</div>
<div class='geshimain'>
<!--eg1-->
<div class="autoit" style="font-family:monospace;">
<span class="kw3">msgbox</span>
</div>
<!--gc2-->
<!--bXNnYm94-->
<!--egc2-->
<!--g2-->
</div>
<!--eg2-->
fdsfdskh
to this OUT:
fkdshfks khh fdsfsk
<div class='codetop'>CODE: AutoIt</div>
<div class='geshimain'>
<div class="autoit" style="font-family:monospace;">
<span class="kw3">msgbox</span>
</div>
</div>
fdsfdskh
Thanks.
Upvotes: 52
Views: 83554
Reputation: 270
You can achieve this with modern JavaScript.
function RemoveHtmlComments() {
let children = document.body.childNodes;
for (let child in children) {
if (children[child].nodeType === Node.COMMENT_NODE) children[child].remove();
}
}
It should be safer than RegEx.
Upvotes: 1
Reputation: 1196
I know that this is quite an old post, but I felt that it would be useful to add to this post in case anyone wants an easy to implement PHP function that directly answers the original question.
/**
* Strip all the html comments from $text
*
* @param $text - text to modify
* @param string $new replacement string
* @return array|string|string[]|null
*/
function strip_html_comments($text, $new=''){
$search = array ("|<!--[\s\S]*?-->|si");
$replace = array ($new);
return preg_replace($search, $replace, $text);
}
Upvotes: 2
Reputation: 1095
If you just want the text or text with specific tags you can handle this with PHP strip_tags it also delete HTML comment and you can save HTML tags you need like this:
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text, ['p', 'a']);
the output will be:
<p>Test paragraph.</p> <a href="#fragment">Other text</a>
I hope it helps somebody!
Upvotes: 0
Reputation: 41
With next:
/( )*<!--((.*)|[^<]*|[^!]*|[^-]*|[^>]*)-->\n*/g
Can remove multiline comments using test string:
fkdshfks khh fdsfsk
<!--g1-->
<div class='codetop'>CODE: AutoIt</div>
<div class='geshimain'>
<!--eg1-->
<div class="autoit" style="font-family:monospace;">
<span class="kw3">msgbox</span>
</div>
<!--gc2-->
<!--bXNnYm94-->
<!--egc2-->
<!--g2-->
</div>
<!--eg2-->
fdsfdskh
<!-- --
> test
- -->
<!-- --
<- test <
>
- -->
<!--
test !<
- <!--
-->
<script type="text/javascript">//<![CDATA[
var xxx = 'a';
//]]></script>
ok
Upvotes: 3
Reputation: 3711
A better version would be:
(?=<!--)([\s\S]*?)-->
It matches html comments like these:
<!--
multi line html comment
-->
or
<!-- single line html comment -->
and what is most important it matches comments like this (the other regex shown by others do not cover this situation):
<!-- this is my blog: <mynixworld.inf> -->
Note
Although syntactically the one below is a html comment your browser might parse it somehow differently and thus it might have a special meaning. Stripping such strings might break your code.
<!--[if !(IE 8) ]><!-->
Upvotes: 41
Reputation: 11
// Remove multiline comment
$mlcomment = '/\/\*(?!-)[\x00-\xff]*?\*\//';
$code = preg_replace ($mlcomment, "", $code);
// Remove single line comment
$slcomment = '/[^:]\/\/.*/';
$code = preg_replace ($slcomment, "", $code);
// Remove extra spaces
$extra_space = '/\s+/';
$code = preg_replace ($extra_space, " ", $code);
// Remove spaces that can be removed
$removable_space = '/\s?([\{\};\=\(\)\\\/\+\*-])\s?/';
$code = preg_replace ('/\s?([\{\};\=\(\)\/\+\*-])\s?/', "\\1", $code);
Upvotes: 0
Reputation: 49
Here is my attempt:
<!--(?!<!)[^\[>][\s\S]*?-->
This will also remove multi line comments and won't remove downlevel-revealed or downlevel-hidden comments.
Upvotes: 2
Reputation: 11
function remove_html_comments($html) {
$expr = '/<!--[\s\S]*?-->/';
$func = 'rhc';
$html = preg_replace_callback($expr, $func, $html);
return $html;
}
function rhc($search) {
list($l) = $search;
if (mb_eregi("\[if",$l) || mb_eregi("\[endif",$l) ) {
return $l;
}
}
Upvotes: 1
Reputation: 334
<!--([\s\S]*?)-->
Works in javascript and VBScript also as "." doesn't match line breaks in all languages
Upvotes: 2
Reputation: 875
Do not forget to consider conditional comments, as
<!--(.*?)-->
will remove them. Try this instead:
<!--[^\[](.*?)-->
This will also remove downlevel-revealed conditional comments, though.
EDIT:
This won't remove downlevel-revealed or downlevel-hidden comments.
<!--(?!<!)[^\[>].*?-->
Upvotes: 17
Reputation: 3820
these code is also remove javascript code. that's too bad :|
here's the example javascript code will be remove with this code:
<script type="text/javascript"><!--
var xxx = 'a';
//-->
</script>
Upvotes: 1
Reputation: 37
Try the following if your comments contain line breaks:
/<!--(.|\n)*?-->/g
Upvotes: 2
Reputation: 591
preg_replace('/<!--(.*)-->/Uis', '', $html)
This PHP code will remove all html comment tags from the $html string.
Upvotes: 52
Reputation: 182772
Are you just trying to remove the comments? How about
s/<!--[^>]*-->//g
or the slightly better (suggested by the questioner himself):
<!--(.*?)-->
But remember, HTML is not regular, so using regular expressions to parse it will lead you into a world of hurt when somebody throws bizarre edge cases at it.
Upvotes: 102