Reputation: 477
I have a bunch of strings that typically looks something like this:
string 1<div>string 2<br></div>string 3
string 1<div>string 2<br></div><div>string 3<br></div>
<div>string 1<br></div><div>string 2<br></div><div>string 3<br></div>
And I need to extract the text (both inside and outside/between elements, as seen above) into an array like this:
['string 1', 'string 2', 'string 3']
Is there a way to do this in pure Javascript?
I tried something like this:
console.log(text.split(/<div>(.*)<br><\/div>/g))
But it only works for the first one:
[ 'string 1', 'string 2', 'string 3' ]
While it fails on the two last variations:
[ 'string 1', 'string 2<br></div><div>string 3', '' ]
[ '', 'string 1<br></div><div>string 2<br></div><div>string 3', '' ]
Upvotes: 2
Views: 767
Reputation: 22
It may not be the best solution but I tried your tree examples with this code :
let regex = /(<([^>]+)>)/ig;
let myArray = myString.replace(regex, "-").split("-");
You might want to change the - character by something else to be sure and you also need to filter your array to remove empty elements but it works
Upvotes: 0
Reputation: 147146
A pure JavaScript approach is generally better than regex for parsing HTML. You can create a template
element, load the HTML into it and then use Array.filter
to get all the child nodes which are text nodes, finally returning their textContent
:
const html = [
'string 1<div>string 2<br></div>string 3',
'string 1<div>string 2<br></div><div>string 3<br></div>',
'<div>string 1<br></div><div>string 2<br></div><div>string 3<br></div>'
]
const getTextContent = (html) => {
let tmp = document.createElement('template');
tmp.innerHTML = html;
const textNodes = [].filter.call(tmp.content.childNodes, n => n.nodeType = Node.TEXT_NODE);
return textNodes.map(o => o.textContent);
}
html.forEach(h => console.log(getTextContent(h)));
Upvotes: 4