James Brooks
James Brooks

Reputation: 1311

RegExp to strip HTML comments

Looking for a regexp sequence of matches and replaces (preferably PHP but doesn't matter) to change this (the start and end is just random text that needs to be preserved).

IN:

fkdshfks khh fdsfsk 
<!--g1-->
<div class='codetop'>CODE: AutoIt</div>
<div class='geshimain'>
    <!--eg1-->
    <div class="autoit" style="font-family:monospace;">
        <span class="kw3">msgbox</span>
    </div>
    <!--gc2-->
    <!--bXNnYm94-->
    <!--egc2-->
    <!--g2-->
</div>
<!--eg2-->
fdsfdskh

to this OUT:

fkdshfks khh fdsfsk 
<div class='codetop'>CODE: AutoIt</div>
<div class='geshimain'>
    <div class="autoit" style="font-family:monospace;">
        <span class="kw3">msgbox</span>
    </div>
</div>
fdsfdskh

Thanks.

Upvotes: 52

Views: 83554

Answers (15)

Kxmode
Kxmode

Reputation: 270

You can achieve this with modern JavaScript.

function RemoveHtmlComments() {
    let children = document.body.childNodes;
    for (let child in children) {
        if (children[child].nodeType === Node.COMMENT_NODE) children[child].remove();
    }
}

It should be safer than RegEx.

Upvotes: 1

Clinton
Clinton

Reputation: 1196

I know that this is quite an old post, but I felt that it would be useful to add to this post in case anyone wants an easy to implement PHP function that directly answers the original question.

/**
 * Strip all the html comments from $text
 *
 * @param $text - text to modify
 * @param string $new replacement string
 * @return array|string|string[]|null
 */
function strip_html_comments($text, $new=''){
    $search = array ("|<!--[\s\S]*?-->|si");
    $replace = array ($new);
    return preg_replace($search, $replace, $text);
}

Upvotes: 2

ThisIsWilliam
ThisIsWilliam

Reputation: 1095

If you just want the text or text with specific tags you can handle this with PHP strip_tags it also delete HTML comment and you can save HTML tags you need like this:

$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text, ['p', 'a']);

the output will be:

<p>Test paragraph.</p> <a href="#fragment">Other text</a>

I hope it helps somebody!

Upvotes: 0

davlem
davlem

Reputation: 41

With next:

/( )*<!--((.*)|[^<]*|[^!]*|[^-]*|[^>]*)-->\n*/g

Can remove multiline comments using test string:

fkdshfks khh fdsfsk 
<!--g1-->
<div class='codetop'>CODE: AutoIt</div>
    <div class='geshimain'>
    <!--eg1-->
    <div class="autoit" style="font-family:monospace;">
        <span class="kw3">msgbox</span>
    </div>
    <!--gc2-->
    <!--bXNnYm94-->
    <!--egc2-->
    <!--g2-->
</div>
<!--eg2-->
fdsfdskh

<!-- --
> test
- -->

<!-- --
<- test <
>
- -->

<!--
test !<
- <!--
-->

<script type="text/javascript">//<![CDATA[
    var xxx = 'a';   
    //]]></script>

ok

Upvotes: 3

Eugen Mihailescu
Eugen Mihailescu

Reputation: 3711

A better version would be:

(?=<!--)([\s\S]*?)-->

It matches html comments like these:

<!--
multi line html comment
-->

or

<!-- single line html comment -->

and what is most important it matches comments like this (the other regex shown by others do not cover this situation):

<!-- this is my blog: <mynixworld.inf> -->

Note

Although syntactically the one below is a html comment your browser might parse it somehow differently and thus it might have a special meaning. Stripping such strings might break your code.

<!--[if !(IE 8) ]><!-->

Upvotes: 41

Mister X
Mister X

Reputation: 11

// Remove multiline comment
    $mlcomment = '/\/\*(?!-)[\x00-\xff]*?\*\//';
    $code = preg_replace ($mlcomment, "", $code);
// Remove single line comment
    $slcomment = '/[^:]\/\/.*/';
    $code = preg_replace ($slcomment, "", $code);
// Remove extra spaces
    $extra_space = '/\s+/';
    $code = preg_replace ($extra_space, " ", $code);
// Remove spaces that can be removed
    $removable_space = '/\s?([\{\};\=\(\)\\\/\+\*-])\s?/';
    $code = preg_replace ('/\s?([\{\};\=\(\)\/\+\*-])\s?/', "\\1", $code);

Upvotes: 0

Alexandr Kondrashov
Alexandr Kondrashov

Reputation: 49

Here is my attempt:

<!--(?!<!)[^\[>][\s\S]*?-->

This will also remove multi line comments and won't remove downlevel-revealed or downlevel-hidden comments.

Upvotes: 2

TurkiM
TurkiM

Reputation: 11

function remove_html_comments($html) {
   $expr = '/<!--[\s\S]*?-->/';
   $func = 'rhc';
   $html = preg_replace_callback($expr, $func, $html);
   return $html;
}

function rhc($search) {
   list($l) = $search;
   if (mb_eregi("\[if",$l) || mb_eregi("\[endif",$l) )  {
      return $l;
   }
}

Upvotes: 1

Toshinou Kyouko
Toshinou Kyouko

Reputation: 334

<!--([\s\S]*?)-->

Works in javascript and VBScript also as "." doesn't match line breaks in all languages

Upvotes: 2

Pierre Wahlgren
Pierre Wahlgren

Reputation: 875

Do not forget to consider conditional comments, as

<!--(.*?)-->

will remove them. Try this instead:

<!--[^\[](.*?)-->

This will also remove downlevel-revealed conditional comments, though.

EDIT:

This won't remove downlevel-revealed or downlevel-hidden comments.

<!--(?!<!)[^\[>].*?-->

Upvotes: 17

TomSawyer
TomSawyer

Reputation: 3820

these code is also remove javascript code. that's too bad :|

here's the example javascript code will be remove with this code:

<script type="text/javascript"><!--
    var xxx = 'a';
    //-->
    </script>

Upvotes: 1

Hadrian
Hadrian

Reputation: 37

Try the following if your comments contain line breaks:

/<!--(.|\n)*?-->/g

Upvotes: 2

Benoit Villi&#232;re
Benoit Villi&#232;re

Reputation: 591

preg_replace('/<!--(.*)-->/Uis', '', $html)

This PHP code will remove all html comment tags from the $html string.

Upvotes: 52

Paul Tomblin
Paul Tomblin

Reputation: 182772

Are you just trying to remove the comments? How about

s/<!--[^>]*-->//g

or the slightly better (suggested by the questioner himself):

<!--(.*?)-->

But remember, HTML is not regular, so using regular expressions to parse it will lead you into a world of hurt when somebody throws bizarre edge cases at it.

Upvotes: 102

James Brooks
James Brooks

Reputation: 1311

Ah I've done it,

<!--(.*?)-->

Upvotes: 9

Related Questions