Reputation: 63587
Given this HTML as a string "html", how can I split it into an array where each header <h
marks the start of an element?
Begin with this:
<h1>A</h1>
<h2>B</h2>
<p>Foobar</p>
<h3>C</h3>
Result:
["<h1>A</h1>", "<h2>B</h2><p>Foobar</p>", "<h3>C</h3>"]
What I've tried:
I wanted to use Array.split()
with a regex, but the result splits each <h
into its own element. I need to figure out how to capture from the start of one <h
until the next <h
. Then include the first one but exclude the second one.
var html = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>';
var foo = html.split(/(<h)/);
Edit: Regex is not a requirement in anyway, it's just the only solution that I thought would work for generally splitting HTML strings in this way.
Upvotes: 17
Views: 53577
Reputation: 196
I just came across this question, needed the same thing in one of my projects. Did the following and works well for all HTML strings.
let splitArray = data.split("><")
splitArray.forEach((item, index) => {
if (index === 0) {
splitArray[index] = item += ">"
return
}
if (index === splitArray.length - 1) {
splitArray[index] = "<" + item
return
}
splitArray[index] = "<" + item + ">"
})
console.log(splitArray)
where data is the HTML string
Upvotes: 1
Reputation: 600
Hi I used this function to convert html String Dom in array
static getArrayTagsHtmlString(str){
let htmlSplit = str.split(">")
let arrayElements = []
let nodeElement =""
htmlSplit.forEach((element)=>{
if (element.includes("<")) {
nodeElement = element+">"
}else{
nodeElement = element
}
arrayElements.push(nodeElement)
})
return arrayElements
}
Happy code
Upvotes: 0
Reputation: 338158
From the comments to the question, this seems to be the task:
I'm taking dynamic markdown that I'm scraping from GitHub. Then I want to render it to HTML, but wrap every title element in a ReactJS
<WayPoint>
component.
The following is a completely library-agnostic, DOM-API based solution.
function waypointify(html) {
var div = document.createElement("div"), nodes;
// parse HTML and convert into an array (instead of NodeList)
div.innerHTML = html;
nodes = [].slice.call(div.childNodes);
// add <waypoint> elements and distribute nodes by headings
div.innerHTML = "";
nodes.forEach(function (node) {
if (!div.lastChild || /^h[1-6]$/i.test(node.nodeName)) {
div.appendChild( document.createElement("waypoint") );
}
div.lastChild.appendChild(node);
});
return div.innerHTML;
}
Doing the same in a modern library with less lines of code is absolutely possible, see it as a challenge.
This is what it produces with your sample input:
<waypoint><h1>A</h1></waypoint>
<waypoint><h2>B</h2><p>Foobar</p></waypoint>
<waypoint><h3>C</h3></waypoint>
Upvotes: 10
Reputation: 47099
In your example you can use:
/
<h // Match literal <h
(.) // Match any character and save in a group
> // Match literal <
.*? // Match any character zero or more times, non greedy
<\/h // Match literal </h
\1 // Match what previous grouped in (.)
> // Match literal >
/g
var str = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>'
str.match(/<h(.)>.*?<\/h\1>/g); // ["<h1>A</h1>", "<h2>B</h2>", "<h3>C</h3>"]
But please don't parse HTML with regexp, read RegEx match open tags except XHTML self-contained tags
Upvotes: 26
Reputation: 3940
I'm sure someone could reduce the for loop to put the angle brackets back in but this is how I'd do it.
var html = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>';
//split on ><
var arr = html.split(/></g);
//split removes the >< so we need to determine where to put them back in.
for(var i = 0; i < arr.length; i++){
if(arr[i].substring(0, 1) != '<'){
arr[i] = '<' + arr[i];
}
if(arr[i].slice(-1) != '>'){
arr[i] = arr[i] + '>';
}
}
Additionally, we could actually remove the first and last bracket, do the split and then replace the angle brackets to the whole thing.
var html = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>';
//remove first and last characters
html = html.substring(1, html.length-1);
//do the split on ><
var arr = html.split(/></g);
//add the brackets back in
for(var i = 0; i < arr.length; i++){
arr[i] = '<' + arr[i] + '>';
}
Oh, of course this will fail with elements that have no content.
Upvotes: 2