Reputation: 4969
I've been trying to improve performance of XML validation against an XSD schema, in particular with respect to XSD unique constraints, and decided to give a try to Woodstox. I pretty much follow this example, to the extent that I change the XMLValidationSchema.SCHEMA_ID_DTD
into XMLValidationSchema.SCHEMA_ID_W3C_SCHEMA
.
The problem I experience, though, is that the Woodstox validator finds an XML valid even if the XML violates a unique constraint. Switching back in my java code to the "regular" javax.xml.validation.Schema
, javax.xml.stream.XMLStreamReader
, and javax.xml.validation.Validator
the uniqueness constraint violation is detected properly.
Also, I can confirm that the Woodstox validator does something, since, e.g., I can see it saying 'XML invalid' if, e.g., I have a negative number where a positive one is XSD-expected in the XML.
Might that be that the tools I've been using until now are fine with respect to uniqueness checks on unqualified elements, whereas Woodstox is not? It's the only idea that comes to my mind...
Also, is anyone in the position to confirm that Woodstox would perform better than Xerces based tools (pretty much everything existing in Java?) validating the unique constraints?
Any help greatly appreciated!
Upvotes: 0
Views: 974
Reputation: 116600
Not sure if this helps, but Woodstox uses Multi-Schema Validator (https://github.com/xmlark/msv) for its XML Schema and Relax NG validation. So if MSV supports validation of uniqueness, then Woodstox should as well.
I do not remember off-hand if this is the case, but the only limitation of MSV I remember is that it does not support assignment of default values (since MSV's position is that it does not do document modification but just validation), so it would seem this should work.
As to performance: since Woodstox is fully streaming and never builds a tree model (like DOM), it will perform linearly for documents of any size, so it could be more efficient for larger documents. But since validation is via MSV, it is difficult to say for sure. Big question there is whether Xerces does validation based on DOM tree (where building of tree is costly part), or if it is able to just use SAX parser.
One thing I would recommend is filing a bug against Woodstox, at:
https://github.com/FasterXML/woodstox
(version 5 is moving to github; you can alternatively also file a Jira issue at http://woodstox.codehaus.org)
since it is theoretically possible that something in MSV integration was not fully working. If so, a small example/test case would be welcome.
Upvotes: 1