DOM navigation: eliminating the text nodes

Question

I have a js script that reads and parses XML. It obtains the XML from an XMLHttpRequest request (which contacts with a php script which returns XML). The script is supposed to receive 2 or more nodes under the first parentNode. The 2 nodes it requires have the name well defined, the other ones can be any name. The output from the php may be:

Here all carpets have 7 nodes.

but it also may be:

Here the first carpet has 7 nodes, the 2nd carpet has 3 nodes. I want my javascript code to treat both exactly the same way in a quick and clean way. If possible, I'd like to remove all the text nodes between each tag. So a code like the one above would always be treated as:


    111.522unknown

Is that possible in a quick and efficient way? I'd like not to use any get function (getElementsByTagName(), getElementById, ...), if possible and if more efficient.

T.J. Crowder · Accepted Answer

It's pretty straightforward to walk the DOM and remove the nodes you consider empty (containing only whitespace).

This is untested (tested and fixed, live copy here), but it would look something like this (replace those magic numbers with symbols, obviously):

var reBlank = /^\s*$/;
function walk(node) {
    var child, next;
    switch (node.nodeType) {
        case 3: // Text node
            if (reBlank.test(node.nodeValue)) {
                node.parentNode.removeChild(node);
            }
            break;
        case 1: // Element node
        case 9: // Document node
            child = node.firstChild;
            while (child) {
                next = child.nextSibling;
                walk(child);
                child = next;
            }
            break;
    }
}
walk(xmlDoc); // Where xmlDoc is your XML document instance

There my definition of "blank" is anything which only has whitespace according to the JavaScript interpreter's understanding of the \s (whitespace) RegExp class. Note that some implementations have issues with \s not being inclusive enough (several Unicode "blank" characters outside the ASCII range not being matched, etc.), so be sure to test with your sample data.

DOM navigation: eliminating the text nodes

Answers (2)

Related Questions