Ti Strga
Ti Strga

Reputation: 1390

Suppressing validation for a subsection of XML

We have a bog-standard javax.xml.* parser that slurps up a big XML file and tries to validate it against a custom DTD. The DTD is stored locally, and we're validating using transformers like in this post from some years back.

All of that works. The trouble we're seeing now is that the XML format for this type of file is written by the devil. I'm not kidding; the specification is over 750 pages and is signed "Love, Satan."

Specifically, part of the XML looks like this:

<KnownTag>
    <ArbitraryTag> ... text ... </ArbitraryTag>
    <Whatever>     ... text ... </Whatever>
    <fj9e8jer23tj> ... text ... </fj9e8jer23tj>
    ....
</KnownTag>

The inner tags are balanced -- the raw syntax is known to be well-formed XML at this point -- but the element names themselves are completely arbitrary and unpredictable. (Yes, it's that evil. The company that originally published this spec has long since gone out of business because their products were notoriously unreliable. Go figure.)

Our custom DTD can specify <!ELEMENT KnownTag ANY>, but we're having fits with the content. Obviously the validating parser gives errors as soon as it hits the first user-specified element name (element type "ArbitraryTag" must be declared), and obviously we can't truly "validate" anything inside that block from a purely parsing context. I'm hoping to find some way of suppressing the errors for just that section of XML.

Upvotes: 2

Views: 116

Answers (1)

C. M. Sperberg-McQueen
C. M. Sperberg-McQueen

Reputation: 25034

DTDs have no mechanism for skipping validation on particular subtrees of well-formed XML; that's one of the differences between DTDs and later schema languages like XSD and Relax NG, which introduce wildcards to make it possible to say things like "The KnownTag element can contain arbitrary XML" (or: arbitrary elements not in a particular namespace, or in any of a particular set of namespaces, or ...).

Whether your parser has a facility to turn error reporting off in a particular subtree is entirely parser-specific; you'll need to describe just which of the many Java-based XML parsers you are using. The chances are slim; it's not impossible for a parser to have such a feature, but at first description it doesn't sound like spec-conformant behavior. (It's also not a feature I've ever heard of a DTD-based validator having, but that doesn't actually prove much.)

Upvotes: 1

Related Questions