totalnoob
totalnoob

Reputation: 2741

Use loop and find html element's values JavaScript

I want to use vanilla js to loop through a string of html text and get its values. with jQuery I can do something like this

var str1="<div><h2>This is a heading1</h2><h2>This is a heading2</h2></div>";
$.each($(str1).find('h2'), function(index, value) {
/// console.log($(value).text());
});

using $(str) converts it to an html string as I understand it and we can then use .text() to get an element (h2)'s value. but I want to do this within my node app on the backend rather than on the client side, because it'd be more efficient (?) and also it'd just be nice to not rely on jQuery.

Some context, I'm working on a blogging app. I want a table of contents created into an object server side.

Upvotes: 1

Views: 2468

Answers (2)

slevy1
slevy1

Reputation: 3832

The best way to parse HTML is to use the DOM. But, if all you have is a string of HTML, according to this Stackoverflow member) you may create a "dummy" DOM element to which you'd add the string to be able to manipulate the DOM, as follows:

var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>aTitle</title></head>
<body><div><h2>This is a heading1</h2><h2>This is a heading2</h2></div>
</body</html>";


Now you have a couple of ways to access the data using the DOM, as follows:

var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>aTitle</title></head><body><div><h2>This is a heading1</h2><h2>This is a heading2</h2></div></body</html>";
    
    // one way
    el.g = el.getElementsByTagName;
    var h2s = el.g("h2");
    for(var i = 0, max = h2s.length; i < max; i++){
        console.log(h2s[i].textContent);
        if (i == max -1) console.log("\n");
    }
    
    // and another
    var elementList = el.querySelectorAll("h2");
    for (i = 0, max = elementList.length; i < max; i++) {
        console.log(elementList[i].textContent);
    }

You may also use a regular expression, as follows:

var str = '<div><h2>This is a heading1</h2><h2>This is a heading2</h2></div>';

var re = /<h2>([^<]*?)<\/h2>/g;
var match;
var m = [];
var i=0;
while ( match = re.exec(str) ) {
    m.push(match.pop());
}
console.log(m);

The regex consists of an opening H2 tag followed by not a "<",followed by a closing H2 tag. The "*?" take into account zero or multiple instances of which there is at least zero or one instance.

Per Ryan of Stackoverflow:

exec with a global regular expression is meant to be used in a loop, as it will still retrieve all matched subexpressions.

The critical part of the regex is the "g" flag as per MDN. It allows the exec() method to obtain multiple matches in a given string. In each loop iteration, match becomes an array containing one element. As each element is popped off and pushed onto m, the array m ultimately contains all the captured text values.

Upvotes: 1

Mulan
Mulan

Reputation: 135197

This is another way using .innerHTML but uses the built-in iterable protocol

Here's the operations we'll need, the types they have, and a link to the documentation of that function

  • Create an HTML element from a text
    String -> HTMLElement – provided by set Element#innerHTML

  • Get the text contents of an HTML element
    HTMLElement -> String – provided by get Element#innerHTML

  • Find nodes matching a query selector
    (HTMLElement, String) -> NodeList – provided by Element#querySelectorAll

  • Transform a list of nodes to a list of text
    (NodeList, HTMLElement -> String) -> [String] – provided by Array.from

// html2elem :: String -> HTMLElement
const html2elem = html =>
  {
    const elem = document.createElement ('div')
    elem.innerHTML = html
    return elem.childNodes[0]
  }

// findText :: (String, String) -> [String]
const findText = (html, selector) =>
  Array.from (html2elem(html).querySelectorAll(selector), e => e.textContent)

// str :: String  
const str =
  "<div><h1>MAIN HEADING</h1><h2>This is a heading1</h2><h2>This is a heading2</h2></div>";

console.log (findText (str, 'h2'))
// [
//   "This is a heading1",
//   "This is a heading2"
// ]
// :: [String]

console.log (findText (str, 'h1'))
// [
//   "MAIN HEADING"
// ]
// :: [String]

Upvotes: 2

Related Questions