Agoun
Agoun

Reputation: 364

XSD schema to validate all elements using a common restriction pattern: e.g. allowable characters

I have a number of complex XSD files used to validate incoming XMLs. These XSDs are standards and get updated once in a while.

On top of that, there is a business restriction dictating that only a specific subset of latin characters, numbers and symbols should be allowed in any element/attribute.

Instead of going to each XSD and add a pattern restriction to each and every simpleType (this is impossible for a number of reasons), my idea is to create a common XSD that will be used as a general validation before the specific XSD validation will be applied.

However, I can't find a way to apply the same restriction to a variable set of elements, no matter where and how deep they are declared.

To give you an idea, I thought of using xsd:any and apply the same simpleType like so:

<!-- XSD for allowable characters -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">

    <xs:element name="AppHdr" type="AnyRoot"/>

    <xs:simpleType name="AnyElement">
        <xs:restriction base="xs:string">
            <xs:pattern value="[A-Za-z0-9]*"/>
        </xs:restriction>
    </xs:simpleType>

    <xs:complexType name="AnyRoot">
        <xs:sequence>
            <xs:any namespace="##any" processContents="skip"
                    minOccurs="0" maxOccurs="unbounded"
                    type="AnyElement"/>
        </xs:sequence>
    </xs:complexType>

</xs:schema>

Unfortunatelly, there is no type attribute allowed in xs:any so this is not valid. Maybe this with a combination of xs:group could do the trick?

Searched for any alternative but with no luck, all similar approached I've found involve changing specific elements in the original XSDs with known names and locations. This is a no-go for me.

At first, it seemed pretty trivial and common case in my mind but it turned out to be a rare scenario.

I'll appreciate if anyone could shed some light to it. Thanks!

[EDIT]:

I finally decided to follow another path instead of XSD, I'm using XSL to filter invalid characters and get a report back if found.

<?xml version="1.0" encoding="UTF-8"?>
<!-- XSL for allowable characters: Andreas Gounaris, 2021 -->
<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:devdom="http://example.com/xfunctions/"
                extension-element-prefixes="devdom">
    <xsl:output method="text" media-type="text/plain" indent="no" omit-xml-declaration="yes" encoding="UTF-8"/>

    <!-- Matching document root -->
    <xsl:template match="/">
        <xsl:apply-templates select="*" mode="verify"/>
    </xsl:template>

    <!-- Remove whitespace between nodes -->
    <xsl:template match="text()[not(normalize-space())]" mode="verify"/>

    <xsl:template match="*" mode="verify">
        <xsl:apply-templates select="@* | node()" mode="verify"/>
    </xsl:template>

    <!-- Matching attribute and text nodes -->
    <xsl:template match="@* | text()" mode="verify">
        <xsl:variable name="invalidContent" select="devdom:exslt_verify_mx_disclosure_chars(.)"/>
        <xsl:if test="$invalidContent">
            <xsl:value-of select="concat(name(parent::*), ': ', $invalidContent)"/>
            <xsl:value-of select="'|-|'"/>
        </xsl:if>
    </xsl:template>

</xsl:stylesheet>

The transformer goes through all attributes and text nodes and calls a XSLT external function exslt_verify_mx_disclosure_chars with the current node as a param. The function matches any but valid chars from a regex.

In this example, I'm returning a str-delimited text file '|-|' but it could be an XML as well.

Upvotes: 1

Views: 659

Answers (2)

Michael Kay
Michael Kay

Reputation: 163332

If this is mixed content then you definitely can't do it (without XSD 1.1 assertions) - there's no way of restricting the text that appears in mixed content.

With simple types, you could make all the (string-based) simple types derive from a subtype of xs:string with the appropriate pattern restriction, but if that's too much refactoring for your taste, then you're going to have to find another way.

If you're going to do a separate validation pass independent of your XSD validation, then XSD doesn't seem the right technology to do it. If you're in the Java world I'd be tempted to do it using a SAX filter placed between the XML parser and the schema validator.

Upvotes: 2

Martin Honnen
Martin Honnen

Reputation: 167571

Schematron using XPath or XSLT 2 or 3 might be like

<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">
    <pattern>
        <rule context="text()[normalize-space()] | @*">
            <assert test="matches(., '^[A-Za-z0-9]*$')">Only ASCII characters and digits</assert>
        </rule>
    </pattern>
</schema>

Upvotes: 1

Related Questions