Reputation: 11
it's my first time that i have to work with XML files in Java.
I have a simple XML file:
<?xml version="1.0" encoding="UTF-8"?>
<ItemList>
<Item id="1">
<Clothes>
<element1>Test Cloth</element1>
<element2>1</element2>
<element3>true</element3>
<element4>1</element4>
<element5>100</element5>
<element6>4</element6>
<element7>false</element7>
</Clothes>
</Item>
</ItemList>
Java:
InputStream is = ItemsLoader.class.getResourceAsStream("ItemList.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(is);
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("Item");
for (int i = 0; i < nList.getLength(); i++) {
Node nNode = nList.item(i);
Element eItemElement = (Element)nNode;
Node elementNode = eItemElement.getFirstChild();
System.out.println("Item Node name: " + nNode.getNodeName());
System.out.println("Element Node name: " + elementNode.getNodeName());
}
My output is:
Item
text#
Why i can't get the child node? The child node of Item should be 'Clothes'..
Thanks for support!
Upvotes: 1
Views: 63
Reputation: 16736
That's because your input XML is indented, and as such, it has whitespace characters.
The first child of Item
is actually a text node, containing all the spaces/tabs/newlines that exist between the >
of Item
and the <
of Clothes
.
If you want to avoid this, you'll have to either condense your XML file so it doesn't contain whitespaces between tags, or to set your JAXP parser to "validating mode" and set it to avoid ignorable whitespaces.
Upvotes: 1
Reputation: 23637
This <item>
element has one child Node:
<Item id="1"><Clothes>...</Clothes></Item>
This other one has three. Two of them are invisible:
<Item id="1">
<Clothes>...</Clothes>
</Item>
The invisible nodes are shown here (I replaced them with [#...#]
):
<Item id="1">[#
#####]<Clothes>...</Clothes>[#]
</Item>
They are text nodes (Text) and they contain all the whitespace characters until the next node of a different type. When you use a method such as getFirstChild()
which returns Node
, you will get the first Node
whatever its type is. You can't always assume it will be an element, unless you have striped all spaces between elements when parsing the source document. It still might not be a document Comment nodes and Processing Instruction nodes also count as children.
The safe way to access your child element nodes is to test whether the node is actually an element. You can do that comparing the node type which you can obtain via the getNodeType() method and the constants in the Node
interface which represent node types and skip the nodes which are not elements.
if(node.getNodeType() == Node.ELEMENT_NODE) {
// this is an element!
Element myElement = (Node)node;
}
You can also use other APIs like DOM4J or JDOM which include extra methods that return child elements, a standard XPath API where you can get a NodeList of elements as the result, or standard DOM methods such as getElementsByTagName which you can call from your context element reference and get all descendant elements from your subtree.
In your program, you can retrieve the Clothes
element node by extracting the iterating on the getChildNodes()
node list and getting the first node that has a node type matching Node.ELEMENT_NODE
.
Upvotes: 2