Reputation: 11676

Checking whether an HTML element contains primitive text?

Take this HTML:

<div id="el1">
  <div id="el2">
    <div id="el3">
      Hello
      <div id="el4">
        World
      </div>
    </div>
  </div>
</div>

Note that el3 and el4 contain primitive text; namely "Hello" and "World". The other elements (el1 and el2) only contain other elements.

And yet, using pure JavaScript, all of their innerHTML properties indicate they contain some form of text.

How can one use pure JavaScript to ascertain whether a particular element contains primitive text as a child. In this instance, the method would also recognise el3 as containing primitive text (even though it also contains another element thereafter).

Something like this:

var els = getElementByTagName("*");

for(var i = 0; i < els.length; i++){

  if( /* element contains text */ ){

    // do something

  }
}

Is this really just a job for RegEx? With all the properties of an HTMLElement, you'd think there would be a better way.

No jQuery, thanks.

Upvotes: 5

Answers (3)

talemyn

Reputation: 7960

Here's an example of how you can use the nodeType to help you get your answer:

var els = document.getElementsByTagName("*");

for (var i = 0; i < els.length; i++) {
    var hasTextNode = false;
    var currChildren = els[i].childNodes;

    for (var j = 0; j < currChildren.length; j++) {
        if ((currChildren[j].nodeType === Node.TEXT_NODE) &&
            (!(/^\s*$/.test(currChildren[j].textContent)))) {
                hasTextNode = true;
                break;
        }
    }

    window.console.log(els[i].id + ((hasTextNode) ? " has" : " does not have") + " a Text Node");
}

Applying that to the HTML that you provided results in this in the console:

el1 does not have a Text Node
el2 does not have a Text Node
el3 has a Text Node
el4 has a Text Node

Note: it is important to check the found Text Nodes for "space only" content, because the DOM will consider all of the indenting and line breaks in the source code as a "Text Node". Obviously, you would want to ignore those.

Upvotes: 2

adeneo

Reputation: 318312

innerHTML gets the HTML, and all of the elements except the last one contains HTML as they are nested.

For instance, the innerHTML of #el2 would be

  <div id="el3">
      Hello
      <div id="el4">
          World
      </div>
  </div>

To get just the text, modern browsers support either innerText or textContent (firefox).
Then there's whitespace, so you should probably trim() the text as well, so something like this

var els = document.querySelectorAll("#wrapper *");

for(var i = 0; i < els.length; i++){
    var el = els[i].cloneNode(true);
    var children = el.children;

    for (var j=children.length; j--;) el.removeChild(children[j]);
    var content = el.innerText ? el.innerText  : el.textContent;

    if( content.trim().length ){
        // do something
        console.log(els[i].getAttribute('id') + ' has text');
    }
}

FIDDLE

Or checking the nodeType and nodeValue of text nodes

var els = document.querySelectorAll("#wrapper *");

for(var i = 0; i < els.length; i++){
    var el = els[i];
    var children = el.childNodes;

    for (var j=children.length; j--;) {
        if( children[j].nodeType === 3 && children[j].nodeValue.trim().length) {
            // do something
            console.log(els[i].getAttribute('id') + ' has text');
        }
    }
}

FIDDLE

Upvotes: 2

Dawn

Reputation: 941

you tell the difference between element nodes and text nodes via the nodeType property. myelementnode.nodeType will return 1, mytextnode.nodeType will return 3.

as the name suggests, getElementsByTagName will only give you element nodes. what you want to do is use the childNodes property of your root node, which will get you all immediate children of that node as a nodelist. so, for el1 you will get just the one child node, el2.

you then have to recursively go through each child node to get its children until you hit a node with type 3 - text.

so for el3, it will return 2 child nodes. The first will be your text, the second will be your el4 element. You'd then need to go into el4 to get its child node.

innerHTML returns a string (of a chunk of html converted to a string), not nodes. you could use that and a regular expression to discard everything that sits within < and >, but that is a bit crude, and with large chunks of html will be an expensive process.

Upvotes: 1

Checking whether an HTML element contains primitive text?

Answers (3)

Related Questions