Reputation: 163
IN parsing an XML file using SAX parser , when exactly is the character() method called by SAX parser ? More specifically , my XML file has many student tags
<Student>
<details>
/*
Contains the details of student
This piece of text may have many special characters
*/
</details>
</Student>
I want all the details of all students to be stored in an arrayList .. But I found that if there are SOME special characters , character()
method is called with indices only upto the special character.How can I overcome this?
Upvotes: 0
Views: 557
Reputation: 163322
The parser is entitled to break up a text node anywhere it likes, delivering the text in multiple calls of characters(). It's quite common for parsers to break the text when it sees an entity or character reference, but that's just for the implementor's convenience and is not in any way guaranteed.
Upvotes: 3
Reputation: 109557
Inside <details>...</details>
the character event might be called several times to provide a part of text.
The XML file is in some encoding, the default being UTF-8. With a programmer's editor like Notepad++ or JEdit you can easily find this out. That should be the same as in the top line:
<?xml version="1.1" encoding="Windows-1252">
If you have the XML as String then you already have Unicode. The text should have been read correctly. That could have been done wrong. Parse with a Reader.
The character encoding conversion seems to go wrong.
Upvotes: 1