Rich
Rich

Reputation: 1156

Remove almost all HTML comments using Regex

Using this regex expression:

preg_replace( '/<!--(?!<!)[^\[>].*?-->/', '', $output )

I'm able to remove all HTML comments from my page except for anything that looks like this:

<!--[if IE 6]>
    Special instructions for IE 6 here
<![endif]-->

How can I modify this to also exclude HTML comments which include a unique phrase, such as "batcache"?

So, an HTML comment this:

<!--
generated 37 seconds ago
generated in 0.978 seconds
served from batcache in 0.004 seconds
expires in 263 seconds
-->

Won't be removed.


This code seems to do the trick:

preg_replace( '/<!--([\s\S]*?)-->/', function( $c ) { return ( strpos( $c[1], '<![' ) !== false || strpos( $c[1], 'batcache' ) !== false ) ? $c[0] : ''; }, $output )

Upvotes: 2

Views: 1594

Answers (1)

ntrp
ntrp

Reputation: 401

This should replace alle the comments which doesn't contain "batcache". The matching is done between this two tags: <!-- to --> .

$result = preg_replace("/<!--((?!batcache)(?!\\[endif\\])[\\s\\S])*?-->/", "", $str);

You can test it here.

As already stated by other users it's not always safe to parse HTML with regex but if you have a relative assurance of what kind of HTML you will parse it should work as expected. If the regex doesn't match some particular usecase let me know.

Upvotes: 2

Related Questions