Reputation: 565
I'm parsing the following XML using parser:
<Person>
<Name>Test</Name>
<Phone>111-111-2222</OtherPhone>
<Address>lee h&y</Address>
<Person>
The characters method of the sax parser is only reading the address data until 'lee h' as it does not consider '&' as a character. I need to get the complete text in the address element. Any ideas on how I should do it? This is my sax parser(here address is a flag which notifies that an address element is present in XML):
boolean address=false;
public void startElement(String uri, String localName,
String qName, Attributes attributes)
throws SAXException {
if (qName.equalsIgnoreCase("Address")) {
address= true;
}
public void characters(char ch[], int start, int length)
throws SAXException {
String data = new String(ch, start, length);
if (address) {
System.out.println("Address is: "+data);
address = false;
}
and the output is:: lee h
Upvotes: 0
Views: 4105
Reputation: 4954
The characters method is called three times here to report the content of the element Address because of the presence of an external entity. You should accumulate the content of the calls to characters until you receive an endElement event and then you have the complete content.
Please note the documentation of the characters method.
You could also benefit from the use of the ignorableWhitespace method with a validating parser and the appropriate schema (e.g. DTD) to let the parser know which spaces are ignorable (due to indentation).
In Java, it could be:
class MyHandler extends DefaultHandler {
private StringBuilder acc;
public MyHandler() {
acc = new StringBuilder();
}
@Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
System.out.printf("Characters accumulated: %s\n", acc.toString());
acc.setLength(0);
}
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
acc.append(ch, start, length);
}
}
Upvotes: 6
Reputation: 7044
The answer depends to some extent which parser you're using.
Here's a thorough rundown on the issue: http://www.ibm.com/developerworks/xml/library/x-tipsaxdo4/index.html
With a StaX parser you can specify the property isCoalescing=true. This property specifies whether to coalesce adjacent adjacent character data.
But with SAX there is no such control, generally.
Upvotes: 0