realsim
realsim

Reputation: 1640

How to get XML element information in case of SAXParseException

When validating an xml source against an xsd schema in a standard java environment, i cannot find a way to get the information about the element that failed validation (in many specific cases).

When catching a SAXParseException, the information of the element is gone. However, when debugging into the xerces.XmlSchemaValidator, i can see that the reason is the specific error message that is not defined to give away information about the element.

For example (and this is also the case in my java demo) the "cvc-mininclusive-valid" error is defined this way: cvc-minInclusive-valid: Value ''{0}'' is not facet-valid with respect to minInclusive ''{1}'' for type ''{2}''. https://wiki.xmldation.com/Support/Validator/cvc-mininclusive-valid

What I would would prefer is, that this kind of message would be produced: cvc-type.3.1.3: The value ''{1}'' of element ''{0}'' is not valid. https://wiki.xmldation.com/Support/Validator/cvc-type-3-1-3

When debugging into xerces.XMLSchemaValidator, I can see that there are two consecutive calls to reportSchemaError(...) - the second only occuring, if the first one did return without an exception being thrown.

Is there any way to configure the validator to use the second way of reporting OR to enrich the SAXParseException with the element information?

Please see my copy&paste&runnable example code below for further explanation:

String xsd =
            "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n" +
                    "<xs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\" version=\"1.0\">" +
                    "<xs:element name=\"demo\">" +
                    "<xs:complexType>" +
                    "<xs:sequence>" +

                    // given are two elements that cannot be < 1
                    "<xs:element name=\"foo\" type=\"xs:positiveInteger\" minOccurs=\"0\" maxOccurs=\"unbounded\" />" +
                    "<xs:element name=\"bar\" type=\"xs:positiveInteger\" minOccurs=\"0\" maxOccurs=\"unbounded\" />" +

                    "</xs:sequence>" +
                    "</xs:complexType>" +
                    "</xs:element>" +
                    "</xs:schema>";

    String xml =
            "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
                    "<demo>" +

                    "<foo>1</foo>" +
                    // invalid!
                    "<foo>0</foo>" +
                    "<bar>2</bar>" +

                    "</demo>";

    Validator validator = SchemaFactory
            .newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)
            .newSchema(new StreamSource(new StringReader(xsd)))
            .newValidator();


    try {
        validator.validate(new StreamSource(new StringReader(xml)));
    } catch (SAXParseException e) {

        // unfortunately no element or line/column info:
        System.err.println(e.getMessage());

        // better, but still no element info:
        System.err.println(String.format("Line %s -  Column %s - %s",
                e.getLineNumber(),
                e.getColumnNumber(),
                e.getMessage()));
    }

Upvotes: 2

Views: 2655

Answers (3)

Guido Schnepp
Guido Schnepp

Reputation: 21

I know this is old, but the answer from Michael Glavassevich works like charme! I'm not yet able to upvote or comment, but this one offers his real deep knowledge.

Upvotes: 0

Michael Glavassevich
Michael Glavassevich

Reputation: 1040

This isn't well documented but if you have a recent version of Xerces-J (see SVN Rev 380997), you can validate a DOMSource and query the Validator from your ErrorHandler to retrieve the current Element node that the validator was processing when it reported the error.

For example, you could write an ErrorHandler like:

public class ValidatorErrorHandler implements ErrorHandler {

private Validator validator;

public ValidatorErrorHandler(Validator v) {
    validator = v;
}

...

public void error(SAXParseException spe) throws SAXException {
    Node node = null;
    try {
        node = (Node) 
            validator.getProperty(
                "http://apache.org/xml/properties/dom/current-element-node");
    }
    catch (SAXException se) {}
    ...
}

and then invoke the Validator with this ErrorHandler like:

Validator validator = SchemaFactory
        .newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)
        .newSchema(new StreamSource(new StringReader(xsd)))
        .newValidator();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(xml));
ErrorHandler errorHandler = new ValidatorErrorHandler(validator);
validator.setErrorHandler(errorHandler);
validator.validate(new DOMSource(doc));

to obtain the element where an error occurred.

Upvotes: 4

julianwki
julianwki

Reputation: 422

Try using an error handler:

    public class LoggingErrorHandler implements ErrorHandler {

    private boolean isValid = true;

    public boolean isValid() {
        return this.isValid;
    }

    @Override
    public void warning(SAXParseException exc) {
        System.err.println(exc);
    }

    @Override
    public void error(SAXParseException exc) {
        System.err.println(exc);
        this.isValid = false;
    }

    @Override
    public void fatalError(SAXParseException exc) throws SAXParseException {
        System.err.println(exc);
        this.isValid = false;
        throw exc;
    }
}

and use it in validator:

        Validator validator = SchemaFactory
                .newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)
                .newSchema(new StreamSource(new StringReader(xsd)))
                .newValidator();
        LoggingErrorHandler errorHandler = new LoggingErrorHandler();
        validator.setErrorHandler(errorHandler);
        validator.validate(new StreamSource(new StringReader(xml)));
        return errorHandler.isValid();

Upvotes: 2

Related Questions