Reputation: 147
Just to clarify what I'm trying to do, I'm trying to make a Chrome extension that can loop through the HTML of the current page and remove html tags containing certain text. But I'm having trouble looping through every html tag.
I've done a bunch of searching for the answer and pretty much every answer says to use:
var items = document.getElementsByTagName("*");
for (var i = 0; i < items.length; i++) {
//do stuff
}
However, I've noticed that if I rebuild the HTML from the page using the elements in "items," I get something different than the page's actual HTML.
For example, the code below returns false:
var html = "";
var elems = document.getElementsByTagName("*");
for (var i = 0; i < elems.length; i++) {
html += elems[i].outerHTML;
}
alert(document.body.outerHTML == html)
var html = "";
var elems = document.getElementsByTagName("*");
alert(elems[0].outerHTML);
Ideally, I would like to be able to get every individual tag, rather than ones wrapped in other tags. I'm kind of new to Javascript so any advice/explanations or example code (In pure javascript if possible) as to what I'm doing wrong would be really helpful. I also realize my approach might be completely wrong, so any better ideas are welcome.
Upvotes: 2
Views: 2686
Reputation: 11607
What you need is the famous Douglas Crockford's WalkTheDOM
:
function walkTheDOM(node, func)
{
func(node);
node = node.firstChild;
while (node)
{
walkTheDOM(node, func);
node = node.nextSibling;
}
}
For each node the func
will be executed. You can filter, transform or whatever by injecting the proper function.
To remove nodes containing a specific text you would do:
function removeAll(node)
{
// protect against "node === undefined"
if (node && node.nodeType === 3) // TEXT_NODE
{
if (node.textContent.indexOf(filter) !== -1) // contains offending text
{
node.parentNode.removeChild(node);
}
}
}
You can use it like this:
filter = "the offending text";
walkTheDOM(document.getElementsByTagName("BODY")[0], removeAll);
If you want to parametize by offending text you can do that, too, by transforming removeAll
into a closure that is instantiated.
Upvotes: 2
Reputation: 19812
References to DOM elements in JavaScript are references to memory addresses of the actual nodes, so you can do something like this (see the working jsfiddle):
Array.prototype.slice.call(document.getElementsByTagName('*')).forEach(function(node) {
if(node.innerHTML === 'Hello') {
node.parentNode.removeChild(node);
}
});
Obviously node.innerHTML === 'Hello'
is just an example, so you'll probably want to figure out how you want to match the text content (perhaps with a RegEx?)
Upvotes: 1