Reputation: 2741
I want to use vanilla js to loop through a string of html text and get its values. with jQuery I can do something like this
var str1="<div><h2>This is a heading1</h2><h2>This is a heading2</h2></div>";
$.each($(str1).find('h2'), function(index, value) {
/// console.log($(value).text());
});
using $(str)
converts it to an html
string as I understand it and we can then use .text()
to get an element (h2)
's value.
but I want to do this within my node app on the backend rather than on the client side, because it'd be more efficient (?) and also it'd just be nice to not rely on jQuery.
Some context, I'm working on a blogging app. I want a table of contents created into an object server side.
Upvotes: 1
Views: 2468
Reputation: 3832
The best way to parse HTML is to use the DOM. But, if all you have is a string of HTML, according to this Stackoverflow member) you may create a "dummy" DOM element to which you'd add the string to be able to manipulate the DOM, as follows:
var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>aTitle</title></head>
<body><div><h2>This is a heading1</h2><h2>This is a heading2</h2></div>
</body</html>";
Now you have a couple of ways to access the data using the DOM, as follows:
var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>aTitle</title></head><body><div><h2>This is a heading1</h2><h2>This is a heading2</h2></div></body</html>";
// one way
el.g = el.getElementsByTagName;
var h2s = el.g("h2");
for(var i = 0, max = h2s.length; i < max; i++){
console.log(h2s[i].textContent);
if (i == max -1) console.log("\n");
}
// and another
var elementList = el.querySelectorAll("h2");
for (i = 0, max = elementList.length; i < max; i++) {
console.log(elementList[i].textContent);
}
You may also use a regular expression, as follows:
var str = '<div><h2>This is a heading1</h2><h2>This is a heading2</h2></div>';
var re = /<h2>([^<]*?)<\/h2>/g;
var match;
var m = [];
var i=0;
while ( match = re.exec(str) ) {
m.push(match.pop());
}
console.log(m);
The regex consists of an opening H2 tag followed by not a "<",followed by a closing H2 tag. The "*?" take into account zero or multiple instances of which there is at least zero or one instance.
Per Ryan of Stackoverflow:
exec with a global regular expression is meant to be used in a loop, as it will still retrieve all matched subexpressions.
The critical part of the regex is the "g" flag as per MDN. It allows the exec() method to obtain multiple matches in a given string. In each loop iteration, match becomes an array containing one element. As each element is popped off and pushed onto m, the array m ultimately contains all the captured text values.
Upvotes: 1
Reputation: 135197
This is another way using .innerHTML
but uses the built-in iterable protocol
Here's the operations we'll need, the types they have, and a link to the documentation of that function
Create an HTML element from a text
String -> HTMLElement
– provided by set Element#innerHTML
Get the text contents of an HTML element
HTMLElement -> String
– provided by get Element#innerHTML
Find nodes matching a query selector
(HTMLElement, String) -> NodeList
– provided by Element#querySelectorAll
Transform a list of nodes to a list of text
(NodeList, HTMLElement -> String) -> [String]
– provided by Array.from
// html2elem :: String -> HTMLElement
const html2elem = html =>
{
const elem = document.createElement ('div')
elem.innerHTML = html
return elem.childNodes[0]
}
// findText :: (String, String) -> [String]
const findText = (html, selector) =>
Array.from (html2elem(html).querySelectorAll(selector), e => e.textContent)
// str :: String
const str =
"<div><h1>MAIN HEADING</h1><h2>This is a heading1</h2><h2>This is a heading2</h2></div>";
console.log (findText (str, 'h2'))
// [
// "This is a heading1",
// "This is a heading2"
// ]
// :: [String]
console.log (findText (str, 'h1'))
// [
// "MAIN HEADING"
// ]
// :: [String]
Upvotes: 2