user3465491
user3465491

Reputation: 11

XML Java reads Node?

it's my first time that i have to work with XML files in Java.

I have a simple XML file:

<?xml version="1.0" encoding="UTF-8"?>
<ItemList>
        <Item id="1">
            <Clothes>
                <element1>Test Cloth</element1>
                <element2>1</element2>
                <element3>true</element3>
                <element4>1</element4>
                <element5>100</element5>
                <element6>4</element6>
                <element7>false</element7>
            </Clothes>
        </Item>
</ItemList>

Java:

InputStream is = ItemsLoader.class.getResourceAsStream("ItemList.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(is);

doc.getDocumentElement().normalize();

NodeList nList = doc.getElementsByTagName("Item");

for (int i = 0; i < nList.getLength(); i++) {
    Node nNode = nList.item(i);

    Element eItemElement = (Element)nNode;
    Node elementNode = eItemElement.getFirstChild();

    System.out.println("Item Node name: " + nNode.getNodeName());
    System.out.println("Element Node name: " + elementNode.getNodeName());

}

My output is:

Item

text#

Why i can't get the child node? The child node of Item should be 'Clothes'..

Thanks for support!

Upvotes: 1

Views: 63

Answers (2)

Isaac
Isaac

Reputation: 16736

That's because your input XML is indented, and as such, it has whitespace characters. The first child of Item is actually a text node, containing all the spaces/tabs/newlines that exist between the > of Item and the < of Clothes.

If you want to avoid this, you'll have to either condense your XML file so it doesn't contain whitespaces between tags, or to set your JAXP parser to "validating mode" and set it to avoid ignorable whitespaces.

Upvotes: 1

helderdarocha
helderdarocha

Reputation: 23637

This <item> element has one child Node:

<Item id="1"><Clothes>...</Clothes></Item>

This other one has three. Two of them are invisible:

<Item id="1">
     <Clothes>...</Clothes> 
</Item>

The invisible nodes are shown here (I replaced them with [#...#]):

<Item id="1">[#
#####]<Clothes>...</Clothes>[#]
</Item>

They are text nodes (Text) and they contain all the whitespace characters until the next node of a different type. When you use a method such as getFirstChild() which returns Node, you will get the first Node whatever its type is. You can't always assume it will be an element, unless you have striped all spaces between elements when parsing the source document. It still might not be a document Comment nodes and Processing Instruction nodes also count as children.

The safe way to access your child element nodes is to test whether the node is actually an element. You can do that comparing the node type which you can obtain via the getNodeType() method and the constants in the Node interface which represent node types and skip the nodes which are not elements.

if(node.getNodeType() == Node.ELEMENT_NODE) { 
    // this is an element!
    Element myElement = (Node)node;
}

You can also use other APIs like DOM4J or JDOM which include extra methods that return child elements, a standard XPath API where you can get a NodeList of elements as the result, or standard DOM methods such as getElementsByTagName which you can call from your context element reference and get all descendant elements from your subtree.

In your program, you can retrieve the Clothes element node by extracting the iterating on the getChildNodes() node list and getting the first node that has a node type matching Node.ELEMENT_NODE.

Upvotes: 2

Related Questions