Maximilian Sun
Maximilian Sun

Reputation: 147

Iterate through all html tags, including children in Javascript

Just to clarify what I'm trying to do, I'm trying to make a Chrome extension that can loop through the HTML of the current page and remove html tags containing certain text. But I'm having trouble looping through every html tag.

I've done a bunch of searching for the answer and pretty much every answer says to use:

var items = document.getElementsByTagName("*");
 for (var i = 0; i < items.length; i++) {
     //do stuff
 }

However, I've noticed that if I rebuild the HTML from the page using the elements in "items," I get something different than the page's actual HTML.

For example, the code below returns false:

var html = "";
var elems = document.getElementsByTagName("*");
for (var i = 0; i < elems.length; i++) {
  html += elems[i].outerHTML;
}

alert(document.body.outerHTML == html)
I also noticed that the code above wasn't giving ALL the html tags, it grouped them into one tag, for example:

var html = "";
var elems = document.getElementsByTagName("*");
alert(elems[0].outerHTML);
I tried fixing the above by recurssively looking for an element's children, but I couldn't seem to get that to work.

Ideally, I would like to be able to get every individual tag, rather than ones wrapped in other tags. I'm kind of new to Javascript so any advice/explanations or example code (In pure javascript if possible) as to what I'm doing wrong would be really helpful. I also realize my approach might be completely wrong, so any better ideas are welcome.

Upvotes: 2

Views: 2686

Answers (2)

pid
pid

Reputation: 11607

What you need is the famous Douglas Crockford's WalkTheDOM:

function walkTheDOM(node, func)
{
  func(node);
  node = node.firstChild;
  while (node)
  {
    walkTheDOM(node, func);
    node = node.nextSibling;
  }
}

For each node the func will be executed. You can filter, transform or whatever by injecting the proper function.

To remove nodes containing a specific text you would do:

function removeAll(node)
{
    // protect against "node === undefined"
    if (node && node.nodeType === 3) // TEXT_NODE
    {
        if (node.textContent.indexOf(filter) !== -1) // contains offending text
        {
            node.parentNode.removeChild(node);
        }
    }
}

You can use it like this:

filter = "the offending text";
walkTheDOM(document.getElementsByTagName("BODY")[0], removeAll);

If you want to parametize by offending text you can do that, too, by transforming removeAll into a closure that is instantiated.

Upvotes: 2

Josh Beam
Josh Beam

Reputation: 19812

References to DOM elements in JavaScript are references to memory addresses of the actual nodes, so you can do something like this (see the working jsfiddle):

Array.prototype.slice.call(document.getElementsByTagName('*')).forEach(function(node) {
    if(node.innerHTML === 'Hello') {
        node.parentNode.removeChild(node);
    }
});

Obviously node.innerHTML === 'Hello' is just an example, so you'll probably want to figure out how you want to match the text content (perhaps with a RegEx?)

Upvotes: 1

Related Questions