Curious
Curious

Reputation: 163

Character() method in SAX parser

IN parsing an XML file using SAX parser , when exactly is the character() method called by SAX parser ? More specifically , my XML file has many student tags

<Student>
  <details>
     /*
       Contains the details of student 
       This piece of text may have many special characters
     */
  </details>
</Student>

I want all the details of all students to be stored in an arrayList .. But I found that if there are SOME special characters , character() method is called with indices only upto the special character.How can I overcome this?

Upvotes: 0

Views: 557

Answers (2)

Michael Kay
Michael Kay

Reputation: 163322

The parser is entitled to break up a text node anywhere it likes, delivering the text in multiple calls of characters(). It's quite common for parsers to break the text when it sees an entity or character reference, but that's just for the implementor's convenience and is not in any way guaranteed.

Upvotes: 3

Joop Eggen
Joop Eggen

Reputation: 109557

Inside <details>...</details> the character event might be called several times to provide a part of text.

The XML file is in some encoding, the default being UTF-8. With a programmer's editor like Notepad++ or JEdit you can easily find this out. That should be the same as in the top line:

<?xml version="1.1" encoding="Windows-1252">

If you have the XML as String then you already have Unicode. The text should have been read correctly. That could have been done wrong. Parse with a Reader.

The character encoding conversion seems to go wrong.

Upvotes: 1

Related Questions