Otra
Otra

Reputation: 8158

Why are there #text nodes in my xml file?

I'm making an android application that does DOM parsing on an xml file. I have an xml file that looks like this:

<?xml version="1.0" encoding="utf-8"?>
<family>
    <grandparent>
        <parent1>
            <child1>Foo</child1>
            <child2>Bar</child2>
        </parent1>
        <parent2>
            <child1>Raz</child1>
            <child2>Mataz</child2>
        </parent2>
    </grandparent>  
</family>

If I run a dom parser on it, like this:

try {
    DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        
    Document doc = builder.parse(input);
    doc.getDocumentElement().normalize();   //added in since the edit
    NodeList nodd = doc.getElementsByTagName("grandparent");
    for (int x = 0; x < nodd.getLength(); x++){
        Node node = nodd.item(x);
        NodeList nodes = node.getChildNodes();
        for(int y = 0; y < nodes.getLength(); y++){
            Node n = nodes.item(y);
            System.out.println(n.getNodeName());
        }
    }
}

My application prints out the following

07-20 18:24:28.395: INFO/System.out(491): #text

07-20 18:24:28.395: INFO/System.out(491): parent1

07-20 18:24:28.395: INFO/System.out(491): #text

07-20 18:24:28.395: INFO/System.out(491): parent2

07-20 18:24:28.395: INFO/System.out(491): #text

My question is, what are those #text fields and more importantly, how do I get rid of them?

Edit: So now that I know what they are, I tried to normalize it. I have updated the code to reflect the changes, but same result.

Upvotes: 10

Views: 9834

Answers (3)

jmartel
jmartel

Reputation: 2781

This is what you get :

1) A node list with all the nodes being the grand-parents

NodeList nodd = doc.getElementsByTagName("grandparent");

2) All the child node of the grand parent x

NodeList nodes = node.getChildNodes();

which are the sub nodes of

< grandparent >
    < parent1 >
       ...
    < /parent1 >

    < parent2 >
       ...
    < /parent2 >
< /grandparent >

3) The child y

nodes.item(y);

There could be text between and this is the #text you have, if you had :

< grandparent >
    yourTextHere1
    < parent1 >
       ...
    < /parent1 >
    yourTextHere2
    < parent2 >
       ...
    < /parent2 >
    yourTextHere3
< /grandparent >

You would get :

yourTextHere1 parent1 yourTextHere2 parent2 yourTextHere3

I hope it helped you ! Julien,

Upvotes: 1

Sharique Abdullah
Sharique Abdullah

Reputation: 655

Do this when parsing the document,

Document doc = builder.parse(input); 
doc.getDocumentElement().normalize();

This would kind of deflate the xml file and remove all unwanted #text children.

Upvotes: 0

Ray Toal
Ray Toal

Reputation: 88428

It's whitespace (newlines, spaces, tabs) :)

Upvotes: 8

Related Questions