Reputation: 21
I'm using JAXB to parse XML stream. This stream may contain HTML formatted data. When i'm unmarshalling this xml with jaxb for invalid html contents like <BR> with no end tag, <P> etc I get the following error:
javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 5; columnNumber: 2987; The element type "BR" must be terminated by the matching end-tag </BR>.]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.createUnmarshalException(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(Unknown Source)
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
at arserImpl$JAXPSAXParser.parse(Unknown Source)
Is there anyway through which I can prevent this HTML formatted data parsing/validating or comment some data in XML, which will be taken as a pure String.
Thanks in advance.
Upvotes: 1
Views: 844
Reputation: 43689
You can use something like JTidy to turn your input into valid XML first.
Upvotes: 2
Reputation: 41135
This is failing because it is invalid XML. Your best solution would be to make whatever is producing this produce valid XML.
If you have the ability to preprocess this file, the way to make it treat portions of the data as plain text is to put it in a CDATA section.
Upvotes: 0