Reputation: 464
I've got html datas that i'm converting into a Dom4J document.
I've met an error:
org.dom4j.DocumentException: Error on line 1 of document : Reference is not allowed in prolog. Nested exception: Reference is not allowed in prolog.
at org.dom4j.io.SAXReader.read(SAXReader.java:482)
at org.dom4j.DocumentHelper.parseText(DocumentHelper.java:278)
at MonTest.main(MonTest.java:21)
Nested exception:
org.xml.sax.SAXParseException: Reference is not allowed in prolog.
It was a character "&" that i needed to escape into & amp; in order to build the document.
In XML, it seems that we need to escape 5 characters: (gt, lt, quot, amp, apos)
Nevertheless, how can i escape it, without escaping it into the "nodes" elements:
<div id="test" class='toto'>A&A<A"A</div>
should give:
<div id="test" class='toto'>A&A<A"A</div>
and not
<div id="test" class='toto'>A&A<A"A</div>
Thank you,
Upvotes: 1
Views: 6689
Reputation: 12817
I would have a look at using a lenient HTML XMLReader instead of the default XMLReader implementation. Something like tag soup or html tidy.
Upvotes: 2
Reputation: 12222
Escape strings before adding to XML document. Use StringEscapeUtils.escapeXml method from Apache Commons Lang. Use some library to build XML e.g. http://code.google.com/p/joox/.
Upvotes: 7