Sanjana
Sanjana

Reputation: 467

Removing whitespace in SAX Parser

I have the following XML file.Why whitespaces are coming in characters() even after applying validation

<Employee>
<Name>
James
</Name>
<Id>
11
</Id>
</Employee>

I am trying to display text in between the tags.

 public class MyHandler extends DefaultHandler {

    boolean isName = false;
    boolean isId = false;

    @Override
    public void characters(char[] arg0, int arg1, int arg2) throws SAXException {
        if (isName) {
            System.out.println(new String(arg0, arg1, arg2));
            isName = false;
        }
        if (isId) {
            System.out.println(new String(arg0, arg1, arg2));
            isId = false;
        }
    }

    @Override
    public void startElement(String arg0, String arg1, String arg2,
            Attributes arg3) throws SAXException {          
        if (arg2.equalsIgnoreCase("Name")) {
            isName = true;
        }
        if (arg2.equalsIgnoreCase("Id")) {
            isId = true;
        }
    }

}

Desired Ouput:

James
11

Actual Output:

James

11

Why spaces ares coming in output?

Upvotes: 2

Views: 5159

Answers (3)

user207421
user207421

Reputation: 311008

If you use a validating parser it will report the ignorable whitespace via ignorableWhitespace() method instead of characters().

Otherwise the parser is perfectly entitled to give you whitespace via characters(). See the Javadoc.

Upvotes: 0

Michael Kay
Michael Kay

Reputation: 163448

You could have the whitespace removed for you if you put the XML through a schema (XSD) validator and declare the types of Name and Id with a type that collapses all whitespace, e.g. type xs:token. A DTD validator will never do this for text nodes (only for attribute nodes).

Upvotes: 2

Jim Garrison
Jim Garrison

Reputation: 86774

The actual string value of the text node that is a child of the <Name> tag is

\nJames\n

Likewise, the string value of the text node in the <Id> is

\n11\n

where \n represents a newline character. None of the newlines are ignorable whitespace. If you want to remove them you must do it yourself, in your Java code.

Upvotes: 2

Related Questions