brunoais
brunoais

Reputation: 6846

DOM navigation: eliminating the text nodes

I have a js script that reads and parses XML. It obtains the XML from an XMLHttpRequest request (which contacts with a php script which returns XML). The script is supposed to receive 2 or more nodes under the first parentNode. The 2 nodes it requires have the name well defined, the other ones can be any name. The output from the php may be:

<?xml version='1.0'?>
<things>
    <carpet>
        <id>1</id>
        <name>1</name>
        <desc>1.5</desc>
    </carpet>
    <carpet>
        <id>2</id>
        <name>2</name>
        <height>unknown</height>
    </carpet>
</things>

Here all carpets have 7 nodes.

but it also may be:

<?xml version='1.0'?>
<things>
    <carpet>
        <id>1</id>
        <name>1</name>
        <desc>1.5</desc>
    </carpet>
    <carpet><id>2</id><name>2</name><height>unknown</height></carpet>
</things>

Here the first carpet has 7 nodes, the 2nd carpet has 3 nodes. I want my javascript code to treat both exactly the same way in a quick and clean way. If possible, I'd like to remove all the text nodes between each tag. So a code like the one above would always be treated as:

<?xml version='1.0'?>
    <things><carpet><id>1</id><name>1</name><desc>1.5</desc></carpet><carpet><id>2</id><name>2</name><height>unknown</height></carpet></things>

Is that possible in a quick and efficient way? I'd like not to use any get function (getElementsByTagName(), getElementById, ...), if possible and if more efficient.

Upvotes: 3

Views: 1758

Answers (2)

T.J. Crowder
T.J. Crowder

Reputation: 1074949

It's pretty straightforward to walk the DOM and remove the nodes you consider empty (containing only whitespace).

This is untested (tested and fixed, live copy here), but it would look something like this (replace those magic numbers with symbols, obviously):

var reBlank = /^\s*$/;
function walk(node) {
    var child, next;
    switch (node.nodeType) {
        case 3: // Text node
            if (reBlank.test(node.nodeValue)) {
                node.parentNode.removeChild(node);
            }
            break;
        case 1: // Element node
        case 9: // Document node
            child = node.firstChild;
            while (child) {
                next = child.nextSibling;
                walk(child);
                child = next;
            }
            break;
    }
}
walk(xmlDoc); // Where xmlDoc is your XML document instance

There my definition of "blank" is anything which only has whitespace according to the JavaScript interpreter's understanding of the \s (whitespace) RegExp class. Note that some implementations have issues with \s not being inclusive enough (several Unicode "blank" characters outside the ASCII range not being matched, etc.), so be sure to test with your sample data.

Upvotes: 6

Liv
Liv

Reputation: 6124

I would just try a very crude string replace: assuming you store this in a variable called xml:

var rex = /(\<(\/)?[A-Za-z0-9]+\>)(\s)+/gi;
var a = xml.replace( rex, "$1" );

here's the complete test I put together:

<html><head></head>

<body>
<script type="text/javascript">
var xml = "<?xml version='1.0'?>\n" + 
"<things>\n" +
"    <carpet>\n" +
"        <id>1</id>\n" +
"        <name>1</name>\n" +
"        <desc>1.5</desc>\n" +
"    </carpet>\n" +
"    <carpet>\n" +
"        <id>2</id>\n" +
"        <name>2</name>\n" +
"        <height>unknown</height>\n" +
"    </carpet>\n" +
"</things>";

var rex = /(\<(\/)?[A-Za-z0-9]+\>)(\s)+/gi;
var a = xml.replace( rex, "$1" );
alert( a );

</script>


</body></html>

Upvotes: 0

Related Questions