Paul de Vrieze
Paul de Vrieze

Reputation: 4918

How do I stop the Sun JDK1.6 builtin StAX parser from resolving DTD entities?

I'm using the StAX event based API's to modify an XML stream.

The stream represents an HTML document, complete with DTD declaration. I would like to copy this DTD declaration into the output document (written using an XMLEventWriter).

When I ask the factory to disregard DTD's, it will not download the DTD, but it removes the whole statement and only leaves a "<!DOCUMENTTYPE" string. When not disregarding, the whole DTD gets downloaded, and included when verbatim outputting the DTD event.

I don't want to use the time to download this DTD, but include the complete DTD specification (resolving entities is already disabled and I don't need that). How can I disable the fetching of external DTD's?

Upvotes: 2

Views: 2074

Answers (2)

StaxMan
StaxMan

Reputation: 116512

Also: your original approach (setting SUPPORT_DTD to false) might work with Woodstox, if so far you have been using the default Sun StAX parser bundled with JDK 1.6.

Upvotes: 1

erickson
erickson

Reputation: 269667

You should be able to implement a custom XMLResolver that redirects attempts to fetch external DTDs to a local resource (if your code parses only a specific doc type, this is often a class resource right in a JAR).

class CustomResolver implements javax.xml.stream.XMLResolver {

  public Object resolveEntity(String publicID,
                              String systemID,
                              String baseURI,
                              String namespace)
                  throws XMLStreamException 
  {
    if ("The public ID you expect".equals(publicID)) {
      return getClass().getResourceAsStream("doc.dtd");
    } else {
      return null;
    }
  }

Note that some documents only include the "systemID", so you should fall back to checking that. The problem with system identifier is that it's supposed to be "system" specific URL, rather than a well-known, stable URI. In practice, it's often used as if it were a URI though.

See the setXMLResolver method.

Upvotes: 4

Related Questions