SAXParser implementation is skipping entities

I have an implementation of org.xml.sax.helpers.DefaultHandler, it works fine except when it comes something like this:

<NAME>Ji&#345;&#237; B&#225;rta</NAME>

The character method is overriden as:


@Override
public void characters(char[] ch, int start, int length) throws SAXException {
    if (currentElement) {
        currentValue = new String(ch, start, length);
        currentElement = false;
    }
}

But the char array that arrives to the method has only 'Ji', skipping the rest of the string. I have another method to convert those entities to UTF-8, but I never get them, so I can't convert anything.

Upvotes: 2

Views: 884

Answers (2)

DwB
DwB

Reputation: 38318

The functionality you describe is correct, your understanding is incorrect.

Try implementing resolveEntity in your Handler class. Interestingly enough, the purpose of resolveEntity is to resolve an entity. The string "Ji&#345;&#237;" starts with two characters "Ji" then contains two entities. "&#345;" is one entity and "&#237;" is another entity.

Another option is to not implement resolveEntity and to implement skippedEntity instead.

Upvotes: 0

forty-two
forty-two

Reputation: 12817

Be aware that the parser may not deliver all character data in one call. To be safe you must build the string from possibly several characters() invocations, bracketed by startElement()/endElement().

As a side note, why do you want to convert the "entities" to UTF-8? They are already converted to UTF-16 characters.

Upvotes: 1

Related Questions