Reputation: 12781
I'm using the SAX parser that comes with JDK7. I'm trying to get hold of the DOCTYPE declaration, but none of the methods in DefaultHandler
seem to be fired for it. What am I missing?
import java.io.StringReader;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class Problem {
public static void main(String[] args) throws Exception {
String xml = "<!DOCTYPE HTML><html><head></head><body></body></html>";
SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
InputSource in = new InputSource(new StringReader(xml));
saxParser.parse(in, new DefaultHandler() {
@Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
System.out.println("Element: " + qName);
}
});;
}
}
This produces:
Element: html
Element: head
Element: body
I want it to produce:
DocType: HTML
Element: html
Element: head
Element: body
How do I get the DocType?
Update: Looks like there's a DefaultHandler2
class to extend. Can I use that as a drop-in replacement?
Upvotes: 2
Views: 1406
Reputation: 66796
Instead of a DefaultHander, use org.xml.sax.ext.DefaultHandler2 which has the startDTD() method.
Report the start of DTD declarations, if any. This method is intended to report the beginning of the DOCTYPE declaration; if the document has no DOCTYPE declaration, this method will not be invoked.
All declarations reported through DTDHandler or DeclHandler events must appear between the startDTD and endDTD events. Declarations are assumed to belong to the internal DTD subset unless they appear between startEntity and endEntity events. Comments and processing instructions from the DTD should also be reported between the startDTD and endDTD events, in their original order of (logical) occurrence; they are not required to appear in their correct locations relative to DTDHandler or DeclHandler events, however.
Note that the start/endDTD events will appear within the start/endDocument events from ContentHandler and before the first startElement event.
However, you must also set the LexicalHandler for the XML Reader.
import java.io.StringReader;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.ext.DefaultHandler2;
public class Problem{
public static void main(String[] args) throws Exception {
String xml = "<!DOCTYPE html><hml><img/></hml>";
SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
InputSource in = new InputSource(new StringReader(xml));
DefaultHandler2 myHandler = new DefaultHandler2(){
@Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
System.out.println("Element: " + qName);
}
@Override
public void startDTD(String name, String publicId,
String systemId) throws SAXException {
System.out.println("DocType: " + name);
}
};
saxParser.setProperty("http://xml.org/sax/properties/lexical-handler",
myHandler);
saxParser.parse(in, myHandler);
}
}
Upvotes: 3