Reputation: 439
I'm looking for a way to remove all JavaScripts tags from a html string.
Following regex works fine, but I would like to add an exception:
$html = preg_replace('#<script[^>]*>.*?</script>#is', '', $html);
How can I add a rule that scripts of a type text/html are getting ignored?
<script type="text/html" ... > ... </script>
Any suggestion?
Thanks in advance.
Upvotes: 3
Views: 3230
Reputation: 13517
Use a greedy match that won't fall to Mike's pointers, like so:
$html = preg_replace('#<script.*</script>#is', '', $html);
This should (greedily) match all script tags. As for the exception, I'm not sure how to do that, sorry.
Upvotes: 1
Reputation: 120516
You may not be trying to sanitize untrusted HTML, but just so readers of this question don't get the wrong idea:
This won't remove javascript outside <script>
elements : <img src=bogus onerror=alert(42)>
.
It won't remove barely obfuscated scripts : <script>alert(42)</script >
.
It will turn invalid content into scripts : <scrip<script></script>t>alert(42)</script>
.
I'm not saying this is what you're trying to do. You may have perfectly good reasons for doing this that don't have to do with untrusted inputs, but, for later readers, don't try to roll your own HTML sanitizer with just regular expressions.
Upvotes: 3