Mayko
Mayko

Reputation: 439

PHP regular expression to remove all javascript with exception

I'm looking for a way to remove all JavaScripts tags from a html string.

Following regex works fine, but I would like to add an exception:

$html = preg_replace('#<script[^>]*>.*?</script>#is', '', $html);

How can I add a rule that scripts of a type text/html are getting ignored?

<script type="text/html" ... > ... </script> 

Any suggestion?

Thanks in advance.

Upvotes: 3

Views: 3230

Answers (2)

Nahydrin
Nahydrin

Reputation: 13517

Use a greedy match that won't fall to Mike's pointers, like so:

$html = preg_replace('#<script.*</script>#is', '', $html);

This should (greedily) match all script tags. As for the exception, I'm not sure how to do that, sorry.

Upvotes: 1

Mike Samuel
Mike Samuel

Reputation: 120516

You may not be trying to sanitize untrusted HTML, but just so readers of this question don't get the wrong idea:

This won't remove javascript outside <script> elements : <img src=bogus onerror=alert(42)>.

It won't remove barely obfuscated scripts : <script>alert(42)</script >.

It will turn invalid content into scripts : <scrip<script></script>t>alert(42)</script>.

I'm not saying this is what you're trying to do. You may have perfectly good reasons for doing this that don't have to do with untrusted inputs, but, for later readers, don't try to roll your own HTML sanitizer with just regular expressions.

Upvotes: 3

Related Questions