DragonFax
DragonFax

Reputation: 4805

validating JAXB, but whitespace not ignored

some code snippets.

The java coding doing the jaxb unmarshaling. pretty straightforward, copied out of tutorials online.

JAXBContext jc = JAXBContext.newInstance( "xmlreadtest" );
Unmarshaller u = jc.createUnmarshaller();

// setting up for validation.
SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
StreamSource schemaSource =  new StreamSource(ReadXml.class.getResource("level.xsd").getFile());
Schema schema = schemaFactory.newSchema(schemaSource);
u.setSchema(schema);

// parsing the xml
URL url = ReadXml.class.getResource("level.xml");
Source sourceRoot = (Source)u.unmarshal(url);

The problem element from the xml file. The element contains nothing but ignorable whitespace. Its badly formated as its shown exactly how its found in the file.

<HashLine _id='FI6'
ppLine='1'
origLine='1'
origFname='level.cpp'>
</HashLine>

The xsd element which described this element.

<xs:element name="HashLine">
  <xs:complexType>
    <xs:attribute name="origLine" type="xs:NMTOKEN" use="required" />
    <xs:attribute name="origFname" type="xs:string" use="required" />
    <xs:attribute name="_id" type="xs:ID" use="required" />
    <xs:attribute name="ppLine" type="xs:NMTOKEN" use="required" />
  </xs:complexType>
</xs:element>

the error is

[org.xml.sax.SAXParseException: cvc-complex-type.2.1: Element 'HashLine' must have no character or element information item [children], because the type's content type is empty.]

I've verified the error is coming from that element.

It loads fine with no validation. But I need to use validation as I'm going to be doing heavy changes and additions to the application, and I have to be certain everything gets marshaled/unmarshaled properly.

It also works fine if I change the complexType to include a simpleContext with an xs:string extension. But I'm getting this issue from entities all over, of which there are alot, amd in alot of xsd files. So its not feasible to base every element in the xml documents on xs:string just to get around this issue.

Event though j2se 6 is using the SchemaFactory from apache-xerces, it doesn't seem to accept the 'ignore-whitespace' feature of from xerces. (i.e. schemaFactory.setFeature() )

Upvotes: 4

Views: 4404

Answers (2)

skaffman
skaffman

Reputation: 403481

I would suggest writing a very simple XSLT transform to strip out the empty content from those specific elements which are causing the problem (e.g. only the HashLine elements). Then put a processing step before you pass the data through JAXB, by using TransformerFactory, Transformer, and so on, which "cleans" the data using the XSLT transform. You could add sorts of cleaning logic in the XSLT for cases where you find other non-JAXB friendly structures in the source XML.

Upvotes: 2

McDowell
McDowell

Reputation: 108889

You could use the StAX API to filter out empty character blocks prior to validation using an EventFilter:

class WhitespaceFilter implements EventFilter {
  @Override
  public boolean accept(XMLEvent event) {
    return !(event.isCharacters() && ((Characters) event)
        .isWhiteSpace());
  }
}

This can be used to wrap your input:

// strip unwanted whitespace
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
XMLEventReader eventReader = inputFactory
    .createXMLEventReader(ReadXml.class.getResourceAsStream("level.xml"));
eventReader = inputFactory.createFilteredReader(eventReader,
    new WhitespaceFilter());

// parsing the xml
Source sourceRoot = (Source) unmarshaller.unmarshal(eventReader);

//TODO: proper error + stream handling

Upvotes: 4

Related Questions