Marios Ledio Taso
Marios Ledio Taso

Reputation: 21

Regex to remove <script> and its content in Javascript

I am trying to remove scripts and their content from html body and this is what I have came up until now

just_text = just_text.replace(/<\s*script[^>]*>(<\s*\/script[^>]*>|$)/ig, '');

It does not work as want to, I still get the content.

Can you please help me?

Thank you

Upvotes: 0

Views: 356

Answers (1)

Felix Kling
Felix Kling

Reputation: 816384

The answer to such questions is always the same: Don't use regular expressions. Instead, parse the HTML, modify the DOM and serialize it back to HTML if you need to.

Example:

var container = document.createElement('div');
container.innerHTML = just_text;

// find and remove `script` elements
var scripts = container.getElementsByTagName('script');
for (var i = scripts.length; i--; ) {
    scripts[i].parentNode.removeChild(scripts[i]);
}

just_text = container.innerHTML;

If you want to remove the script tags from the page itself, it's basically the same:

var scripts = document.body.getElementsByTagName('script');
for (var i = scripts.length; i--; ) {
    scripts[i].parentNode.removeChild(scripts[i]);
}

Upvotes: 6

Related Questions