Horrible Performance Parsing XHTML File with Doctype as XML Document

Question

When I parse this xhtml file as xml, it takes approximately 2 minutes to do the parsing on such a simple file. I have found that if I remove the doctype declaration, it parses nigh instantaneously. What is wrong that is causing this file to take so long to parse?

Java Example

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware( true );
DocumentBuilder bob = dbf.newDocumentBuilder();
Document template = bob.parse( new InputSource( new FileReader( xmlFile ) ) );

XHTML Example




    Test
    
        Test
        Hello, World!
        Text

Thanks

Edit: Solution

To actually fix the problem based on the information provided about why it was happening in the first place, I did these basic steps:

Downloaded the DTD-related files to a src/main/resources folder
Created a custom EntityResolver to read these files from the classpath
Told my DocumentBuilder to use my new EntityResolver

I referenced this SO answer in doing so: how to validate XML using java?

New EntityResolver

import java.io.IOException;

import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

public class LocalXhtmlDtdEntityResolver implements EntityResolver {

    /* (non-Javadoc)
     * @see org.xml.sax.EntityResolver#resolveEntity(java.lang.String, java.lang.String)
     */
    @Override
    public InputSource resolveEntity( String publicId, String systemId )
            throws SAXException, IOException {
        String fileName = systemId.substring( systemId.lastIndexOf( "/" ) + 1 );    
        return new InputSource( 
                getClass().getClassLoader().getResourceAsStream( fileName ) );
    }

}

How to use new EntityResolver:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware( true );
DocumentBuilder bob = dbf.newDocumentBuilder();
bob.setEntityResolver( new LocalXhtmlDtdEntityResolver() );
Document template = bob.parse( new InputSource( new FileReader( xmlFile ) ) );

Charlie · Accepted Answer

Java is downloading the specified DTD and its and included files in order to validate that your xhtml file obeys the specified DTD. Using Charles proxy I recorded the following requests taking the specified amounts to load:

Horrible Performance Parsing XHTML File with Doctype as XML Document

Answers (2)

Related Questions