Reputation: 45
I'm attempting to use JAXB with data that does not technically fit the XML standard; in particular, the names of the elements are technically invalid as they begin with numeric characters. Here's an overview of what the schema looks like.
<xs:element name = "ITEM">
<xs:complexType>
<xs:sequence>
<xs:element name="01" />
<xs:element name="08" />
<xs:element name="10">
<xs:complexType>
<xs:sequence>
<xs:element name="10_A" />
<xs:element name="10_B" />
</xs:sequence>
</xs:complexType>
</xs:element>
...
...Many more elements...
...
</xs:sequence>
</xs:complexType>
</xs:element>
Unfortunately, I don't have the ability to modify this. Since the full ITEM is huge and has many levels of depth, using an automated tool like JAXB to create classes is a must. To do so, I prefixed the names of the elements with a character (in this case, 'm') so that XJC would accept it. I was hoping that at runtime, I could map the XML tags to my Java class in order to unmarshal the input into a Java object. In particular, something like this:
@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "", propOrder = {
"m01",
"m08",
"m10",
...
})
@XmlRootElement(name = "ITEM")
public class ITEM {
@XmlElement(name = "01")
protected String m01;
@XmlElement(name = "08")
protected String m08;
@XmlElement(name = "10")
protected M10 m10;
...
}
M10 would look like:
@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "", propOrder = {
"m10a",
"m10b",
...
})
public static class M10 {
@XmlElement(name = "10_A")
protected String m10a;
@XmlElement(name = "10_B")
protected String m10b;
...
}
I was hoping that JAXB would be able to match the @XmlElement tag to the tag in the input, but unfortunately this didn't work out for me because JAXB won't have any of this business with improper tags. If anybody is interested, the particular exception is:
org.xml.sax.SAXParseException: The content of elements must consist of well-formed character data or markup
Anyone have any advice on how to get around this problem? I feel like I could potentially run a regex swap on the input XML before JAXB parses it (and thus bypassing this issue completely), but modifying the input in such a way is rather undesirable.
Upvotes: 2
Views: 984
Reputation: 163675
It's not "XML that is technically invalid". It's simply not XML. There is no way of processing stuff that follows some of the XML rules but doesn't follow others - except perhaps to find an XML repair tool that turns into proper XML.
Upvotes: 1
Reputation: 149057
It is not the JAXB (JSR-222) implementation complaining, but the underlying parser being used. The trick will be to find a tolerant XML parser.
StAX
If you can find a StAX (JSR-173) parser capable of handling this content then you could do the following:
import java.io.StringReader;
import javax.xml.bind.*;
import javax.xml.stream.*;
public class Demo {
public static void main(String[] args) throws Exception {
JAXBContext jc = JAXBContext.newInstance(ITEM.class);
Unmarshaller unmarshaller = jc.createUnmarshaller();
StringReader xml = new StringReader("<ITEM><01>Hello World</01></ITEM");
XMLStreamReader xsr = XMLInputFactory.newFactory().createXMLStreamReader(xml);
ITEM item = (ITEM) unmarshaller.unmarshal(xsr);
}
}
SAX
Or if you find a SAX parser then you can do the following:
import java.io.StringReader;
import javax.xml.bind.*;
import javax.xml.parsers.*;
import org.xml.sax.*;
public class Demo {
public static void main(String[] args) throws Exception {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
XMLReader xr = sp.getXMLReader();
JAXBContext jc = JAXBContext.newInstance(ITEM.class);
UnmarshallerHandler unmarshallerHandler = jc.createUnmarshaller().getUnmarshallerHandler();
xr.setContentHandler(unmarshallerHandler);
StringReader xml = new StringReader("<ITEM><01>Hello World</01></ITEM");
InputSource inputSource = new InputSource(xml);
xr.parse(inputSource);
ITEM item = (ITEM) unmarshallerHandler.getResult();
}
}
Upvotes: 2