MysteryMoose
MysteryMoose

Reputation: 2363

Handling Empty Nodes Using Java DOM

I have a question concerning XML, Java's use of DOM, and empty nodes. I am currently working on a project wherein I take an XML descriptor file of abstract machines (for text parsing) and parse a series of input strings with them. The actual building and interpretation of these abstract machines is all done and working fine, but I have come across a rather interesting XML requirement. Specifically, I need to be able to turn an empty InputString node into an empty string ("") and still execute my parsing routines. The problem, however, occurs when I attempt to extract this blank node from my XML tree. This causes a null pointer exception and then generally bad things start happening. Here is the offending snippet of XML (Note the first element is empty):

    <InputStringList>
        <InputString></InputString>
        <InputString>000</InputString>
        <InputString>111</InputString>
        <InputString>01001</InputString>
        <InputString>1011011</InputString>
        <InputString>1011000</InputString>
        <InputString>01010</InputString>
        <InputString>1010101110</InputString>
    </InputStringList>

I extract my strings from the list using:

//Get input strings to be validated
xmlElement = (Element)xmlMachine.getElementsByTagName(XML_INPUT_STRING_LIST).item(0);
xmlNodeList = xmlElement.getElementsByTagName(XML_INPUT_STRING);
for (int j = 0; j < xmlNodeList.getLength(); j++) {

    //Add input string to list
    if (xmlNodeList.item(j).getFirstChild().getNodeValue() != null) {
        arrInputStrings.add(xmlNodeList.item(j).getFirstChild().getNodeValue());

    } else {
        arrInputStrings.add("");

    }
}

How should I handle this empty case? I have found a lot of information on removing blank text nodes, but I still actually have to parse the blank nodes as empty strings. Ideally, I would like to avoid using a special character to denote a blank string.

Thank you in advance for your time.

Upvotes: 4

Views: 14233

Answers (2)

Lukas Eder
Lukas Eder

Reputation: 221106

You could use a library like jOOX to generally simplify standard DOM manipulation. With jOOX, you'd get the list of strings as such:

List<String> strings = $(xmlMachine).find(XML_INPUT_STRING_LIST)
                                    .find(XML_INPUT_STRING)
                                    .texts();

Upvotes: 1

bobince
bobince

Reputation: 536567

if (xmlNodeList.item(j).getFirstChild().getNodeValue() != null) {

nodeValue shouldn't be null; it would be firstChild itself that might be null and should be checked for:

Node firstChild= xmlNodeList.item(j).getFirstChild();
arrInputStrings.add(firstChild==null? "" : firstChild.getNodeValue());

However note that this is still sensitive to the content being only one text node. If you had an element with another element in, or some text and a CDATA section, just getting the value of the first child isn't enough to read the whole text.

What you really want is the textContent property from DOM Level 3 Core, which will give you all the text inside the element, however contained.

arrInputStrings.add(xmlNodeList.item(j).getTextContent());

This is available in Java 1.5 onwards.

Upvotes: 8

Related Questions