Reputation: 467
I have the following XML file.Why whitespaces are coming in characters()
even after applying validation
<Employee>
<Name>
James
</Name>
<Id>
11
</Id>
</Employee>
I am trying to display text in between the tags.
public class MyHandler extends DefaultHandler {
boolean isName = false;
boolean isId = false;
@Override
public void characters(char[] arg0, int arg1, int arg2) throws SAXException {
if (isName) {
System.out.println(new String(arg0, arg1, arg2));
isName = false;
}
if (isId) {
System.out.println(new String(arg0, arg1, arg2));
isId = false;
}
}
@Override
public void startElement(String arg0, String arg1, String arg2,
Attributes arg3) throws SAXException {
if (arg2.equalsIgnoreCase("Name")) {
isName = true;
}
if (arg2.equalsIgnoreCase("Id")) {
isId = true;
}
}
}
Desired Ouput:
James
11
Actual Output:
James
11
Why spaces ares coming in output?
Upvotes: 2
Views: 5159
Reputation: 311008
If you use a validating parser it will report the ignorable whitespace via ignorableWhitespace()
method instead of characters().
Otherwise the parser is perfectly entitled to give you whitespace via characters().
See the Javadoc.
Upvotes: 0
Reputation: 163448
You could have the whitespace removed for you if you put the XML through a schema (XSD) validator and declare the types of Name and Id with a type that collapses all whitespace, e.g. type xs:token. A DTD validator will never do this for text nodes (only for attribute nodes).
Upvotes: 2
Reputation: 86774
The actual string value of the text node that is a child of the <Name>
tag is
\nJames\n
Likewise, the string value of the text node in the <Id>
is
\n11\n
where \n
represents a newline character. None of the newlines are ignorable whitespace. If you want to remove them you must do it yourself, in your Java code.
Upvotes: 2