Vadim Loboda
Vadim Loboda

Reputation: 3101

JavaScript Regex: How to split html string into array of html elements and text nodes?

For example, this html string:

Lorem <b>ipsum</b> dolor <span class="abc">sit</span> amet,<br/>consectetur <input value="ok"/> adipiscing elit.

into this array:

[ 
  'Lorem ',
  '<b>ipsum</b>',
  ' dolor ', 
  '<span class="abc">sit</span>', 
  ' amet,', 
  '<br/>', 
  'consectetur ', 
  '<input value="ok"/>', 
  'adipiscing elit.' 
]

Here is the example of html elements match:

const pattern = /<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)<\/\1>|<([A-Z][A-Z0-9]*).*?\/>/gi;
let html = 'Lorem <b>ipsum</b> dolor <span class="abc">sit</span> amet,<br/>consectetur <input value="ok"/> adipiscing elit.'
let nodes = html.match(pattern);

console.log(nodes)

How to add the text nodes as well?

Upvotes: 1

Views: 2061

Answers (1)

CertainPerformance
CertainPerformance

Reputation: 370689

If the HTML is formatted properly, consider using DOMParser instead, to select all children, then take each child's .outerHTML (for element nodes) or .textContent (for text nodes):

const str = `Lorem <b>ipsum</b> dolor <span class="abc">sit</span> amet,<br/>consectetur <input value="ok"/> adipiscing elit.`;

const doc = new DOMParser().parseFromString(str, 'text/html');
const arr = [...doc.body.childNodes]
  .map(child => child.outerHTML || child.textContent);
console.log(arr);

You don't have to use DOMParser - you could also put the string into an ordinary element on the page, then take that element's children, but that'll allow for arbitrary code execution, which should be avoided.

Upvotes: 4

Related Questions