Reputation: 1604
Let's say I have the following string:
var myString = "<p>hello</p><script>console.log('hello')</script><h1>Test</h1><script>console.log('world')</script>"
I would like to use split to get an array with the contents of the script tags. e.g. I want my output to be:
["console.log('hello')", "console.log('world')"]
I tried doing myString.split(/[<script></script>]/)
But did not get the expected output.
Any help is appreciated.
Upvotes: 7
Views: 12648
Reputation: 2254
Javascript Code:
function myFunction() {
var str = "<p>hello</p><script>console.log('hello')</script><h1>Test</h1><script>console.log('world')</script>";
console.log(str.match(/<script\b[^>]*>(.*?)<\/script>/gm));
}
Upvotes: 2
Reputation: 288120
You can't parse (X)HTML with regex.
Instead, you can parse it using innerHTML
.
var element = document.createElement('div');
element.innerHTML = myString; // Parse HTML properly (but unsafely)
However, this is not safe. Even if innerHTML
doesn't run the JS inside script
elements, malicious strings can still run arbitrary JS, e.g. with <img src="//" onerror="alert()">
.
To avoid that problem, you can use DOMImplementation.createHTMLDocument
to create a new document, which can be used as a sandbox.
var doc = document.implementation.createHTMLDocument(); // Sandbox
doc.body.innerHTML = myString; // Parse HTML properly
Alternatively, new browsers support DOMParser
:
var doc = new DOMParser().parseFromString(myString, 'text/html');
Once the HTML string has been parsed to the DOM, you can use DOM methods like getElementsByTagName
or querySelectorAll
to get all the script
elements.
var scriptElements = doc.getElementsByTagName('script');
Finally, [].map
can be used to obtain an array with the textContent
of each script
element.
var arrayScriptContents = [].map.call(scriptElements, function(el) {
return el.textContent;
});
The full code would be
var doc = document.implementation.createHTMLDocument(); // Sandbox
doc.body.innerHTML = myString; // Parse HTML properly
[].map.call(doc.getElementsByTagName('script'), function(el) {
return el.textContent;
});
Upvotes: 18
Reputation: 1190
You have to escape the forward slash like so: /.
myString.split(/(<script>|<\/script>)/)
Upvotes: 1