Reputation: 7896
How can you tell an XML parser to ignore entities that are referenced but not declared?
I am getting exceptions like this:
org.xml.sax.SAXParseException: The entity "alpha" was referenced, but not declared.
What I want is for the parser to treat the string "α" as a simple string of characters, not as a character entity.
Also, I have a lot of these entities, so I can't tell the parser to ignore them singly.
Upvotes: 4
Views: 1757
Reputation: 27994
You could write a script (using sed, or perl, for example) that uses regexp replacement to preprocess the input documents and escape the ampersands, except at the beginning of character entities that XML recognizes (i.e. the five predefined ones, and any that you have declared).
E.g. the script would replace &
with &
at the beginning of strings like α
, yielding α
. But it would leave <
and  
alone.
The question you're asking boils down to "How do I get tools that are designed to parse XML (i.e. well-formed XML) to handle non-XML (i.e. not-well-formed XML)?" And the answer will pretty much always be to use non-XML tools first to fix up the input and make it well-formed.
Upvotes: 4