i_trope
i_trope

Reputation: 1604

Parse contents of script tags inside string

Let's say I have the following string:

var myString = "<p>hello</p><script>console.log('hello')</script><h1>Test</h1><script>console.log('world')</script>"

I would like to use split to get an array with the contents of the script tags. e.g. I want my output to be:

["console.log('hello')", "console.log('world')"]

I tried doing myString.split(/[<script></script>]/) But did not get the expected output.

Any help is appreciated.

Upvotes: 7

Views: 12648

Answers (3)

Ritesh  Karwa
Ritesh Karwa

Reputation: 2254

Javascript Code:

   function myFunction() {
        var str = "<p>hello</p><script>console.log('hello')</script><h1>Test</h1><script>console.log('world')</script>";

        console.log(str.match(/<script\b[^>]*>(.*?)<\/script>/gm));
}

Upvotes: 2

Oriol
Oriol

Reputation: 288120

You can't parse (X)HTML with regex.

Instead, you can parse it using innerHTML.

var element = document.createElement('div');
element.innerHTML = myString; // Parse HTML properly (but unsafely)

However, this is not safe. Even if innerHTML doesn't run the JS inside script elements, malicious strings can still run arbitrary JS, e.g. with <img src="//" onerror="alert()">.

To avoid that problem, you can use DOMImplementation.createHTMLDocument to create a new document, which can be used as a sandbox.

var doc = document.implementation.createHTMLDocument(); // Sandbox
doc.body.innerHTML = myString; // Parse HTML properly

Alternatively, new browsers support DOMParser:

var doc = new DOMParser().parseFromString(myString, 'text/html');

Once the HTML string has been parsed to the DOM, you can use DOM methods like getElementsByTagName or querySelectorAll to get all the script elements.

var scriptElements = doc.getElementsByTagName('script');

Finally, [].map can be used to obtain an array with the textContent of each script element.

var arrayScriptContents = [].map.call(scriptElements, function(el) {
    return el.textContent;
});

The full code would be

var doc = document.implementation.createHTMLDocument(); // Sandbox
doc.body.innerHTML = myString; // Parse HTML properly
[].map.call(doc.getElementsByTagName('script'), function(el) {
    return el.textContent;
});

Upvotes: 18

kaz
kaz

Reputation: 1190

You have to escape the forward slash like so: /.

 myString.split(/(<script>|<\/script>)/)

Upvotes: 1

Related Questions