"False" text node causing trouble

Question

I'm working on a DOM traversal type of script and I'm almost finished with it. However, there is one problem that I've encountered and for the life of me, I can't figure out what to do to fix it. Pardon my ineptitude, as I'm brand new to JS/JQuery and I'm still learning the ropes.

Basically, I'm using Javascript/JQuery to create an "outline", representing the structure of an HTML page, and appending the "outline" to the bottom of the webpage. For example, if the HTML is this...


    
    
    
        Hello World

Then the output should be an unordered list like this:

html
- head
- body
  - h1
    - text(Hello World)
  - script src(”http://code.jquery.com/jquery-2.1.0.min.js”) type(”text/javascript”)
  - script src(”outline.js”) type(”text/javascript”)

Here's what I've got so far:

var items=[];
$(document).ready(function(){
    $("").appendTo("body");

    traverse(document, function (node) { 
        if(node.nodeName.indexOf("#") <= -1){
            items.push(""+""+node.nodeName.toLowerCase());
        }
        else {
            var x = "text("+node.nodeValue+")";
            if(node.nodeValue == null) {
                items.push("
 document");
            }
            else if(/[a-z0-9]/i.test(node.nodeValue) && node.nodeValue != null) {
                items.push(""+ x +"");
            }
            else {
                items.push("");
            }
        }
    });
    $('#list').append(items.join(''));
});

function traverse(node, func) {
    func(node);
    node = node.firstChild;
    while (node) {
        traverse(node, func);
        node = node.nextSibling;
    }
}

It works almost perfectly, except it seems to read a carriage return as a text node. For example, if there's


</code></pre>

<p>it reads that properly, adding head as an unordered list element, and then creating a new "unordered list" for title, which is nested inside the header. HOWEVER, if it's </p>

<pre><code><head>
    <title>
</code></pre>

<p>It makes the new unordered list and its element, "head", but then jumps to the else statement that does <code>items.push(</ul>)</code> . How do I get it to ignore the carriage return? I tried testing to see if the nodeValue was equal to the carriage return, \r, but that didn't seem to do the trick. </p>

jfriend00 · Accepted Answer

I'm having a bit of a hard time understanding exactly which text nodes you want to skip. If you just want to skip a text node that is only whitespace, you can do that like this:

var onlyWhitespaceRegex = /^\s*$/;

traverse(document, function (node) { 
    if (node.nodeType === 3 && onlyWhitespaceRegex.test(node.nodeValue) {
        // skip text nodes that contain only whitespace
        return;
    }
    else if (node.nodeName.indexOf("#") <= -1){
        items.push(""+""+node.nodeName.toLowerCase());
    } else ...

Or, maybe you just want to trim any multiple leading or trailing whitespaces off a text node before displaying it since it may not display in HTML.

var trimWhitespaceRegex = /^\s+|\s+$/g;

traverse(document, function (node) { 
    if(node.nodeName.indexOf("#") <= -1){
        items.push(""+""+node.nodeName.toLowerCase());
    } else {
        var text = node.nodeValue;
        if (node.nodeType === 3) {
            text = text.replace(trimWhitespaceRegex, " ");
        }
        var x = "text("+text+")";
        if(node.nodeValue == null) {
            items.push(" document");
        } ....

A further description of exactly what you're trying to achieve in the output for various forms of different text nodes would help us better understand your requirements.

"False" text node causing trouble

Answers (1)

Related Questions

&quot;False&quot; text node causing trouble

Answers (1)

Related Questions

"False" text node causing trouble