Reputation: 431
I have a string of elements on multiple lines (but i can change this to being all on one line if necessary) and I want to split it on the <section> element. I thought this would be easy, just str.split(regex), or even str.split('<section') but it's not working. It never breaks the sections out.
I've tried using a regular expression SecRegex = /<section.?>[\s\S]?</section>/; var fndSection = result.split(SecRegex);
Tried var fndSection = result.split('<section');
I've looked all over the net and from what I've found one of the two methods above should have worked.
result = '
<chapter id="chap1">
<para0><title></title></para0>
</chapter>
<chapter id="chap2"> <title>THEORY</title>
<section id="Thoery">
<title>theory Section</title>
<para0 verstatus="ver">
<title>Theory Para 0 </title>
<text>blah blah</text>
</para0>
</section>
<section id="Next section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<section id="More sections">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<section id="section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<chapter id="chap1">
<para0><title></title></para0>
</chapter>
<chapter id="chap1">
<para0><title></title></para0>
</chapter>
<chapter> <title>Chapter Title</title>
<section id="Section ID">
<title>Section Title</title>
<para0>
<title>Para0 Title</title>
<para>blah blah</para>
</para0>
</section>
<section id="Next section">
<title>title</title>
<para0>
<line>Title</line>
<text>blah blah</text>
</para0>
</section>
<section id="More sections">
<title>title</title>
<para0>
<list>Title</list>
<text>blah blah</text>
</para0>
</section>
<section id="section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<ipbchap>
<tags></tags>
</ipbchap>
</body>
<rear>
<tags></tags>
</rear>
</doc>'
Code
SecRegex = /<section.*?>[\s\S]*?<\/section>/;
var fndSection = result.split(SecRegex);
console.log("result string " + fndSection);
This is the result I'm getting from the code I have
result string <chapter id="chap2"> <title>THEORY</title> , , , , <chapter id="chap1"> <para0> <title></title></para0> </chapter>
result string <chapter id="chap1"> <para0> <title></title></para0> </chapter>
result string <chapter
As you can see
What I want is a string of <section>.*?</section> into an array
Thank you everyone for looking at this and helping me. I appreciate all your help.
Maxine
Upvotes: 1
Views: 108
Reputation: 43880
Do not use RegEx on HTML (or any cousin of HTML). Collect your <section>s
into a NodeList. Convert that NodeList into an Array. Convert each Node into a String. This could be done in one line:
const strings = Array.from(document.querySelectorAll('section')).map(section => section.outerHTML);
The following demo is a breakdown of the example above.
// Collect all <section>s into a NodeList
const sections = document.querySelectorAll('section');
// Convert NodeList into an Array
const array = Array.from(sections);
/*
Iterate through Array -- on each <section>...
convert it into a String
*/
const strings = array.map(section => section.outerHTML);
// View array as a template literal for a cleaner look
console.log(`${strings}`);
// Verifying it's an array of mutiple elements
console.log(strings.length);
// Verifying that they are in fact strings
console.log(typeof strings[0]);
<chapter id="chap1">
<para0>
<title></title>
</para0>
</chapter>
<chapter id="chap2">
<title>THEORY</title>
<section id="Thoery">
<title>theory Section</title>
<para0 verstatus="ver">
<title>Theory Para 0 </title>
<text>blah blah</text>
</para0>
</section>
<section id="Next section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<section id="More sections">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<section id="section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<chapter id="chap1">
<para0>
<title></title>
</para0>
</chapter>
<chapter id="chap1">
<para0>
<title></title>
</para0>
</chapter>
<chapter>
<title>Chapter Title</title>
<section id="Section ID">
<title>Section Title</title>
<para0>
<title>Para0 Title</title>
<para>blah blah</para>
</para0>
</section>
<section id="Next section">
<title>title</title>
<para0>
<line>Title</line>
<text>blah blah</text>
</para0>
</section>
<section id="More sections">
<title>title</title>
<para0>
<list>Title</list>
<text>blah blah</text>
</para0>
</section>
<section id="section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<ipbchap>
<tags></tags>
</ipbchap>
Upvotes: 1
Reputation: 27723
Your expression looks pretty great! You might just want to slightly modify it, maybe to something similar to:
/<section[a-z="'\s]+>([\s\S]*?)<\/section>/gmi
If this wasn't your desired expression, you can modify/change your expressions in regex101.com.
You can also visualize your expressions in jex.im:
const regex = /<section[a-z="'\s]+>([\s\S]*?)<\/section>/gmi;
const str = `<chapter id="chap1">
<para0><title></title></para0>
</chapter>
<chapter id="chap2"> <title>THEORY</title>
<section id="Thoery">
<title>theory Section</title>
<para0 verstatus="ver">
<title>Theory Para 0 </title>
<text>blah blah</text>
</para0>
</section>
<section id="Next section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<section id="More sections">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<section id="section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>`;
const subst = `$1`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
In case you might want to capture the section tags as well, you can simply wrap your entire expression in a capturing group:
const regex = /(<section[a-z="'\s]+>([\s\S]*?)<\/section>)/gmi;
const str = `<chapter id="chap1">
<para0><title></title></para0>
</chapter>
<chapter id="chap2"> <title>THEORY</title>
<section id="Thoery">
<title>theory Section</title>
<para0 verstatus="ver">
<title>Theory Para 0 </title>
<text>blah blah</text>
</para0>
</section>
<section id="Next section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<section id="More sections">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<section id="section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>`;
const subst = `\n$1\n`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: \n', result);
Upvotes: 2
Reputation: 28970
You don't need to split the string - you want to extract the data that matches your pattern from it. You can do that using String#match
. Note that you need to add the g
flag to get all matches:
var result = `<chapter id="chap1">
<para0><title></title></para0>
</chapter>
<chapter id="chap2"> <title>THEORY</title>
<section id="Thoery">
<title>theory Section</title>
<para0 verstatus="ver">
<title>Theory Para 0 </title>
<text>blah blah</text>
</para0>
</section>
<section id="Next section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<section id="More sections">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<section id="section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<chapter id="chap1">
<para0><title></title></para0>
</chapter>
<chapter id="chap1">
<para0><title></title></para0>
</chapter>
<chapter> <title>Chapter Title</title>
<section id="Section ID">
<title>Section Title</title>
<para0>
<title>Para0 Title</title>
<para>blah blah</para>
</para0>
</section>
<section id="Next section">
<title>title</title>
<para0>
<line>Title</line>
<text>blah blah</text>
</para0>
</section>
<section id="More sections">
<title>title</title>
<para0>
<list>Title</list>
<text>blah blah</text>
</para0>
</section>
<section id="section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<ipbchap>
<tags></tags>
</ipbchap>
</body>
<rear>
<tags></tags>
</rear>
</doc>`;
// the g flag is added ---------------------↓
SecRegex = /<section.*?>[\s\S]*?<\/section>/g;
var fndSection = result.match(SecRegex);
console.log("result string ", fndSection);
However, you are better off parsing the DOM and extracting the information you want from there - this is simple using DOMParser
:
var result = `<chapter id="chap1">
<para0><title></title></para0>
</chapter>
<chapter id="chap2"> <title>THEORY</title>
<section id="Thoery">
<title>theory Section</title>
<para0 verstatus="ver">
<title>Theory Para 0 </title>
<text>blah blah</text>
</para0>
</section>
<section id="Next section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<section id="More sections">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<section id="section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<chapter id="chap1">
<para0><title></title></para0>
</chapter>
<chapter id="chap1">
<para0><title></title></para0>
</chapter>
<chapter> <title>Chapter Title</title>
<section id="Section ID">
<title>Section Title</title>
<para0>
<title>Para0 Title</title>
<para>blah blah</para>
</para0>
</section>
<section id="Next section">
<title>title</title>
<para0>
<line>Title</line>
<text>blah blah</text>
</para0>
</section>
<section id="More sections">
<title>title</title>
<para0>
<list>Title</list>
<text>blah blah</text>
</para0>
</section>
<section id="section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
<ipbchap>
<tags></tags>
</ipbchap>
</body>
<rear>
<tags></tags>
</rear>
</doc>`
var parser = new DOMParser();
var doc = parser.parseFromString(result, "text/html");
var sections = [...doc.getElementsByTagName("section")];
var fndSection = sections.map(section => section.outerHTML)
console.log(fndSection);
Upvotes: 1