giorgos_412
giorgos_412

Reputation: 35

SAX parsing and special characters

I want to parse some data from an xml file using SAX parser. My xml is as follows:

<categories>
 <cat>Pies &amp; past</cat>
 <cat>Fruits</cat>
</categories>

In order to parse this data I extend DefaultHandler.

The output after parsing is:

cat 1 = Pies

cat 2 = &

cat 3 = past

cat 4 = Fruits

Why is this happening instead of getting:

cat 1 = Pies & past

cat 2 = Fruits

Upvotes: 3

Views: 8639

Answers (2)

Ted Hopp
Ted Hopp

Reputation: 234797

My guess is that you are treating each call to characters as delivering the complete text for a cat element. You should code your handler so that successive calls to characters accumulate the text, and you only capture it on the endElement event:

public class CatHandler extends DefaultHandler {
    private StringBuilder chars = new StringBuilder();

    public void startElement(String uri, String lName, String qName, Attributes a)
    {
        final String name = qName == null ? lName : qName;
        if ("cat".equals(name)) {
            chars.setLength(0);
        } else . . .
    }

    public void endElement(String uri, String lName, String qName) {
        final String name = qName == null ? lName : qName;
        if ("cat".equals(name)) {
            String catName = chars.toString();
            // do something with cat name
        } else . . .
    }

    public void characters(char[] ch, int start, int length) {
        chars.append(ch, start, length);
    }

Upvotes: 10

Brian Agnew
Brian Agnew

Reputation: 272297

The characters() method doesn't have to return the complete text element. Rather you should collate the text available in each characters() call, and concatenate these upon the corresponding endElement() call.

From the doc:

The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks

(my emphasis)

Upvotes: 3

Related Questions