Java Dev
Java Dev

Reputation: 11

How to validate XML data against blacklisted keywords using XSD?

I am trying to validate xml data against a list of blacklisted/prohibited words using XML Schema Definition(XSD) in Java. Is there a way to define the blacklisted words in XSD and raise error if the xml data matches any from the list?.

We can use enumeration as shown below to check the data to be from a list of values and throw an error if not from pre-defined enumeration.

Ex:
<xs:simpleType name="OrderStatus">
  <xs:restriction base="xs:string">

    <xs:enumeration value="Pending" />
    <xs:enumeration value="Processing" />
    <xs:enumeration value="Shipped" />
    <xs:enumeration value="Delivered" />

  </xs:restriction>
</xs:simpleType>

But, My use case is other way around, basically to define the blacklisted words in XSD and raise error if the xml data matches any from the list. Is there a pre defined way like the one above (or) do we need to achieve this through regular expression ? Appreciate if anyone can post a sample format.

Upvotes: 1

Views: 48

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167716

It is possible with XSD 1.1 and an assertion e.g.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  
  <xs:element name="root">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="data" type="not-blacklisted" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  
  <xs:simpleType name="not-blacklisted">
    <xs:restriction base="xs:string">
      <xs:assertion test="not($value = ('black', 'list'))"/>
    </xs:restriction>
  </xs:simpleType>
  
</xs:schema>

gives two validation errors on a sample like

<root>
  <data>foo</data>
  <data>bar</data>
  <data>black</data>
  <data>foobar</data>
  <data>list</data>
</root>

errors (raised by Saxon EE)

Validation error on line 4 column 21 of blacklist-instance-invalid1.xml:
  FORG0001: The content "black" of element <data> does not match the required simple type.
  Value "black" contravenes the assertion facet "not($value = ('black', 'list')..." of the
  type Q{}not-blacklisted
  See https://www.w3.org/TR/xmlschema11-2/#cvc-datatype-valid clause 1
Validation error on line 6 column 20 of blacklist-instance-invalid1.xml:
  FORG0001: The content "list" of element <data> does not match the required simple type.
  Value "list" contravenes the assertion facet "not($value = ('black', 'list')..." of the
  type Q{}not-blacklisted
  See https://www.w3.org/TR/xmlschema11-2/#cvc-datatype-valid clause 1

or

[Error] blacklist-instance-invalid1.xml:4:21: cvc-assertions-valid: Value 'black' is not facet-valid with respect to assertion 'not($value = ('black', 'list'))'.
[Error] blacklist-instance-invalid1.xml:4:21: cvc-assertion: Assertion evaluation ('not($value = ('black', 'list'))') for element 'data' on schema type 'not-blacklisted' did not succeed.
[Error] blacklist-instance-invalid1.xml:6:20: cvc-assertions-valid: Value 'list' is not facet-valid with respect to assertion 'not($value = ('black', 'list'))'.
[Error] blacklist-instance-invalid1.xml:6:20: cvc-assertion: Assertion evaluation ('not($value = ('black', 'list'))') for element 'data' on schema type 'not-blacklisted' did not succeed.

with Apache Xerces xerces-2_12_2-xml-schema-1.1. With Java you have (at least) two options for XSD 1.1, the commercial Saxon EE and the open-source XSD 1.1 release of Apache Xerces.

Upvotes: 3

Related Questions