Reputation: 8158
I'm making an android application that does DOM parsing on an xml file. I have an xml file that looks like this:
<?xml version="1.0" encoding="utf-8"?>
<family>
<grandparent>
<parent1>
<child1>Foo</child1>
<child2>Bar</child2>
</parent1>
<parent2>
<child1>Raz</child1>
<child2>Mataz</child2>
</parent2>
</grandparent>
</family>
If I run a dom parser on it, like this:
try {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(input);
doc.getDocumentElement().normalize(); //added in since the edit
NodeList nodd = doc.getElementsByTagName("grandparent");
for (int x = 0; x < nodd.getLength(); x++){
Node node = nodd.item(x);
NodeList nodes = node.getChildNodes();
for(int y = 0; y < nodes.getLength(); y++){
Node n = nodes.item(y);
System.out.println(n.getNodeName());
}
}
}
My application prints out the following
07-20 18:24:28.395: INFO/System.out(491): #text
07-20 18:24:28.395: INFO/System.out(491): parent1
07-20 18:24:28.395: INFO/System.out(491): #text
07-20 18:24:28.395: INFO/System.out(491): parent2
07-20 18:24:28.395: INFO/System.out(491): #text
My question is, what are those #text fields and more importantly, how do I get rid of them?
Edit: So now that I know what they are, I tried to normalize it. I have updated the code to reflect the changes, but same result.
Upvotes: 10
Views: 9834
Reputation: 2781
This is what you get :
1) A node list with all the nodes being the grand-parents
NodeList nodd = doc.getElementsByTagName("grandparent");
2) All the child node of the grand parent x
NodeList nodes = node.getChildNodes();
which are the sub nodes of
< grandparent >
< parent1 >
...
< /parent1 >
< parent2 >
...
< /parent2 >
< /grandparent >
3) The child y
nodes.item(y);
There could be text between and this is the #text you have, if you had :
< grandparent >
yourTextHere1
< parent1 >
...
< /parent1 >
yourTextHere2
< parent2 >
...
< /parent2 >
yourTextHere3
< /grandparent >
You would get :
yourTextHere1 parent1 yourTextHere2 parent2 yourTextHere3
I hope it helped you ! Julien,
Upvotes: 1
Reputation: 655
Do this when parsing the document,
Document doc = builder.parse(input);
doc.getDocumentElement().normalize();
This would kind of deflate the xml file and remove all unwanted #text children.
Upvotes: 0