Srinivas
Srinivas

Reputation: 565

How to read escaped characters using SAX parser in Characters method?

I'm parsing the following XML using parser:

<Person>
<Name>Test</Name>
<Phone>111-111-2222</OtherPhone>
<Address>lee h&amp;y</Address>
<Person>

The characters method of the sax parser is only reading the address data until 'lee h' as it does not consider '&' as a character. I need to get the complete text in the address element. Any ideas on how I should do it? This is my sax parser(here address is a flag which notifies that an address element is present in XML):

boolean address=false;

 public void startElement(String uri, String localName,
            String qName, Attributes attributes)
            throws SAXException {


        if (qName.equalsIgnoreCase("Address")) {
            address= true;

        }

    public void characters(char ch[], int start, int length)
                throws SAXException {

            String data = new String(ch, start, length);


            if (address) {

                System.out.println("Address is: "+data);
                address = false;
            }

and the output is:: lee h

Upvotes: 0

Views: 4105

Answers (2)

Ludovic Kuty
Ludovic Kuty

Reputation: 4954

The characters method is called three times here to report the content of the element Address because of the presence of an external entity. You should accumulate the content of the calls to characters until you receive an endElement event and then you have the complete content.

Please note the documentation of the characters method.

You could also benefit from the use of the ignorableWhitespace method with a validating parser and the appropriate schema (e.g. DTD) to let the parser know which spaces are ignorable (due to indentation).

In Java, it could be:

class MyHandler extends DefaultHandler {

    private StringBuilder acc;

    public MyHandler() {
        acc = new StringBuilder();
    }

    @Override
    public void endElement(String uri, String localName, String qName)
            throws SAXException {
        System.out.printf("Characters accumulated: %s\n", acc.toString());
        acc.setLength(0);
    }

    @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {
        acc.append(ch, start, length);
    }
}

Upvotes: 6

Mike Sokolov
Mike Sokolov

Reputation: 7044

The answer depends to some extent which parser you're using.

Here's a thorough rundown on the issue: http://www.ibm.com/developerworks/xml/library/x-tipsaxdo4/index.html

With a StaX parser you can specify the property isCoalescing=true. This property specifies whether to coalesce adjacent adjacent character data.

But with SAX there is no such control, generally.

Upvotes: 0

Related Questions