Mick M
Mick M

Reputation: 43

How do I get the entire HTML document as a string excluding some elements?

I am looking for a way to get the entire HTML document, excluding a few items (possibly tagged with a className called 'exclude') as a string. I know I can grab the entire document with document.documentElement.innerHTML or document.documentElement.outerHTML and document.getElementsByTagName('html')[0].innerHTML

What i am still still struggling with is how do I exclude some of the nodes (such as buttons or divs or any other tags, that have a common className, before I get the innerHTML?

Upvotes: 4

Views: 607

Answers (3)

cнŝdk
cнŝdk

Reputation: 32145

Well you can use querySelector() along with the :not() css selector upon your HTML block, to exclude unwanted elements from it.

var content = document.getElementsByTagName('html')[0]
var selection = content.querySelectorAll('*:not(.ignore)');

Then just use outerHTML to get the whole content from your selection:

var htmlString = selection[0].outerHTML;

Otherwise you can loop over the selection elements and for each one append its HTML to your result string:

var htmlString = "";
selection.forEach(function(el) {
  htmlString += el.innerHTML;
});

Demo:

var content = document.getElementsByTagName('html')[0]
var selection = content.querySelectorAll('*:not(.ignore)');

//Then log the selection content
console.log(selection[0].outerHTML);

//Or maybe loop throught the elements and get their contents
var htmlString = "";
selection.forEach(function(el) {
  htmlString += el.innerHTML;
});

console.log(htmlString);

Note:

  • In this demo there were no elements with ignore class, but you can always put it and test.
  • You can see that this will also keep all HTML elements including scripts and styles tags.

Upvotes: 0

Alex
Alex

Reputation: 2232

I know I'm late to the party but here is my contribution, I have used chŝdk's idea to implement it.


let markup = document.querySelectorAll('*:not(.exclude)')[0].innerHTML;

console.log("Data Type: " + typeof(markup));
console.log(markup);
<center>
  <div>Hello World</div>
  <div class="exclude">Hello World [Exclude Me]</div>
  <div>Hello World</div>
  <div>Hello World</div>
  <div>Hello World</div>
  <div class="exclude">Hello World [Exclude Me]</div>
  <div class="exclude">Hello World [Exclude Me]</div>
  <div>Hello World</div>
  <div>Hello World</div>
  <div class="exclude">Hello World [Exclude Me]</div>
</center>

Upvotes: 0

T.J. Crowder
T.J. Crowder

Reputation: 1074285

I'd probably clone the whole tree, then remove the elements you don't want:

var clone = document.body.cloneNode(true);
clone.querySelectorAll(".exclude").forEach(function(element) {
    element.parentNode.removeChild(element);
});
var html = clone.outerHTML;

Note that this assumes body, itself, doesn't have the exclude class.

Example:

var clone = document.body.cloneNode(true);
// Snippet-specific: Also remove the script
clone.querySelectorAll(".exclude, script").forEach(function(element) {
    element.parentNode.removeChild(element);
});
var html = clone.outerHTML;
console.log(html);
<div>
  I want this
  <div>And this</div>
</div>
<div class="exclude">
  I don't want this
  <div>Or this, since its parent is excluded</div>
</div>

Upvotes: 5

Related Questions