what the
what the

Reputation: 477

Splitting by html elements in Javascript

I have a bunch of strings that typically looks something like this:

string 1<div>string 2<br></div>string 3
string 1<div>string 2<br></div><div>string 3<br></div>
<div>string 1<br></div><div>string 2<br></div><div>string 3<br></div>

And I need to extract the text (both inside and outside/between elements, as seen above) into an array like this:

['string 1', 'string 2', 'string 3']

Is there a way to do this in pure Javascript?

I tried something like this:

console.log(text.split(/<div>(.*)<br><\/div>/g))

But it only works for the first one:

[ 'string 1', 'string 2', 'string 3' ]

While it fails on the two last variations:

[ 'string 1', 'string 2<br></div><div>string 3', '' ]
[ '', 'string 1<br></div><div>string 2<br></div><div>string 3', '' ]

Upvotes: 2

Views: 767

Answers (2)

Nicolas Bertho
Nicolas Bertho

Reputation: 22

It may not be the best solution but I tried your tree examples with this code :

let regex = /(<([^>]+)>)/ig;
let myArray = myString.replace(regex, "-").split("-");

You might want to change the - character by something else to be sure and you also need to filter your array to remove empty elements but it works

Upvotes: 0

Nick
Nick

Reputation: 147146

A pure JavaScript approach is generally better than regex for parsing HTML. You can create a template element, load the HTML into it and then use Array.filter to get all the child nodes which are text nodes, finally returning their textContent:

const html = [
  'string 1<div>string 2<br></div>string 3',
  'string 1<div>string 2<br></div><div>string 3<br></div>',
  '<div>string 1<br></div><div>string 2<br></div><div>string 3<br></div>'
]

const getTextContent = (html) => {
  let tmp = document.createElement('template');
  tmp.innerHTML = html;
  const textNodes = [].filter.call(tmp.content.childNodes, n => n.nodeType = Node.TEXT_NODE);
  return textNodes.map(o => o.textContent);
}

html.forEach(h => console.log(getTextContent(h)));

Upvotes: 4

Related Questions