Reputation: 1955
Say I have
var string =
"<h1>Header</h1>
<p>this is a small paragraph</p>
<ul>
<li>list element 1.</li>
<li>list element 2.</li>
<li>list element 3. With a small update.</li>
</ul>"
//newlines for clarity only
How can I split this string, using javascript so that I get
var array = string.split(/*...something here*/)
array = [
"<h1>Header</h1>",
"<p>this is a small paragraph</p>",
"<ul><li>list element 1.</li><li>list element 2.</li><li>list element 3. With a small update.</li></ul>"
]
I only want to split the top html elements, not the children.
Upvotes: 2
Views: 9843
Reputation: 652
A performant solution ( http://jsperf.com/spliting-html ):
var splitter = document.createElement('div'),
text = splitter.innerHTML = "<h1>Header</h1>\
<p>this is a small paragraph</p>\
<ul>\
<li>list element 1.</li>\
<li>list element 2.</li>\
<li>list element 3. With a small update.</li>\
</ul>",
parts = splitter.children,
part = parts[0].innerHTML;
Upvotes: 2
Reputation: 27525
You can't do this with regular expressions. Your regular expression will fail if you have several nested elements of the same type, e.g.
<div>
<div>
<div>
</div>
</div>
</div>
This is due to the fact that regular expressions can only process regular languages, and HTML is a real context-free language (and context-free is "more complex" than regular).
See also: https://stackoverflow.com/a/1732454/2170192
But if you don't have nested elements of the same type, you may split your html-string by taking all matches returned by the following regular expression (which uses backlinks):
/<(\w+).*<\/\1\s*>/igsm
<(\w+)
matches less-than-sign and several word-characters (letters, digits, underscores), while capturing the word-characters via parentheses (first capturing group). .*
matches contents of the element. <\/
matches opening of the end-tag. \1
is the backreference which matches exactly the sequence of symbols captured via the first capturing group. \s*>
matches optional whitespace and the greater-than sign. igsm
are modifiers: case-insensitive, global, dot-matches-all-symbols and multi-line. Upvotes: 1
Reputation: 298146
You could do something like this:
var string = '<div><p></p></div><h1></h1>';
var elements = $(string).map(function() {
return $('<div>').append(this).html(); // Basically `.outerHTML()`
});
And the result:
["<h1>Header</h1>", "<p>this is a small paragraph</p>", "<ul> <li>list element 1.</li> <li>list element 2.</li> <li>list element 3. With a small update.</li></ul>"]
Upvotes: 3