Reputation: 1789
Been using XML for ages now for data storage & transfer, but have never had to validate or transform it. Currently starting a new project and making some design decisions and need to know some rudimentary things about XSL & Schemas.
Our XML is like this (excuse the boring book example :) ):
<Books>
<Book>
<ID>1</ID>
<Name>Book1</Name>
<Price>24.??</Price>
<Country>US</Country>
</Book>
<Book>
<ID>1</ID>
<Name></Name>
<Price>24.69</Price>
</Book>
</Books>
Our requirements:
Transformation
a) Turn "US" into United States
b) if Price > 20 create a new lLement <Expensive>True</Expensive>
I'm guessing this is done with XSLT, but can anyone give me some pointers on how to achieve this?
Validation
a) is ID an integer, is Price a float (the most important job to be honest)
b) Are all tags filled, e.g. the name tag is not filled (2nd most important)
c) Are all tags present, e.g. Country is missing for book 2
d) [Probably tricky] Is the ID element unique through all books? (nice to have)
From what I have read this is done with a Schema or Relax NG, but can the results of the validation be outputted to a simple HTML to display a list or errors?
e.g.
Book 1: Price "Price.??" is not float
Book 2: ID is not unique, Name empty, Country missing
Or would it be better do do these things programatically in C#? Thanks.
Upvotes: 1
Views: 450
Reputation: 66
On general XSL education, you may find useful an XSL Primer I wrote some years back. It's not current on all the latest trends, but covers the basics of how the XML document is processed.
Upvotes: 0
Reputation:
This stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:m="map"
exclude-result-prefixes="m">
<xsl:key name="kTestIntID" match="Book"
use="number(ID)=number(ID) and not(contains(ID,'.'))"
m:message="Books with no integer ID"/>
<xsl:key name="kTestFloatPrice" match="Book"
use="number(Price)=number(Price) and contains(Price,'.')"
m:message="Books with no float Price"/>
<xsl:key name="kTestEmptyElement" match="Book"
use="not(*[not(node())])"
m:message="Books with empty element"/>
<xsl:key name="kTestAllElements" match="Book"
use="ID and Name and Price and Country"
m:message="Books with missing element"/>
<xsl:key name="kBookByID" match="Book" use="ID"/>
<m:map from="US" to="United States"/>
<m:map from="CA" to="Canada"/>
<xsl:variable name="vCountry" select="document('')/*/m:map"/>
<xsl:variable name="vKeys" select="document('')/*/xsl:key/@name
[starts-with(.,'kTest')]"/>
<xsl:variable name="vTestNotUniqueID"
select="*/*[key('kBookByID',ID)[2]]"/>
<xsl:template match="/" name="validation">
<xsl:param name="pKeys" select="$vKeys"/>
<xsl:param name="pTest" select="$vTestNotUniqueID"/>
<xsl:param name="pFirst" select="true()"/>
<xsl:choose>
<xsl:when test="$pTest and $pFirst">
<html>
<body>
<xsl:if test="$vTestNotUniqueID">
<h2>Books with no unique ID</h2>
<ul>
<xsl:apply-templates
select="$vTestNotUniqueID"
mode="escape"/>
</ul>
</xsl:if>
<xsl:variable name="vCurrent" select="."/>
<xsl:for-each select="$vKeys">
<xsl:variable name="vKey" select="."/>
<xsl:for-each select="$vCurrent">
<xsl:if test="key($vKey,'false')">
<h2>
<xsl:value-of
select="$vKey/../@m:message"/>
</h2>
<ul>
<xsl:apply-templates
select="key($vKey,'false')"
mode="escape"/>
</ul>
</xsl:if>
</xsl:for-each>
</xsl:for-each>
</body>
</html>
</xsl:when>
<xsl:when test="$pKeys">
<xsl:call-template name="validation">
<xsl:with-param name="pKeys"
select="$pKeys[position()!=1]"/>
<xsl:with-param name="pTest"
select="key($pKeys[1],'false')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="Book" mode="escape">
<li>
<xsl:call-template name="escape"/>
</li>
</xsl:template>
<xsl:template match="*" name="escape" mode="escape">
<xsl:value-of select="concat('<',name(),'>')"/>
<xsl:apply-templates mode="escape"/>
<xsl:value-of select="concat('</',name(),'>')"/>
</xsl:template>
<xsl:template match="text()" mode="escape">
<xsl:value-of select="normalize-space()"/>
</xsl:template>
<!-- Up to here, rules for validation.
From here, rules for transformation -->
<xsl:template match="@*|node()" name="identity">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Country/text()">
<xsl:variable name="vMatch"
select="$vCountry[@from=current()]"/>
<xsl:choose>
<xsl:when test="$vMatch">
<xsl:value-of select="$vMatch/@to"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="Price[. > 20]">
<xsl:call-template name="identity"/>
<Expensive>True</Expensive>
</xsl:template>
</xsl:stylesheet>
With your input, output:
<html>
<body>
<h2>Books with no unique ID</h2>
<ul>
<li><Book><ID>1</ID><Name>Book1</Name><Price>24.??</Price><Country>US</Country></Book></li>
<li><Book><ID>1</ID><Name></Name><Price>24.69</Price></Book></li>
</ul>
<h2>Books with no float Price</h2>
<ul>
<li><Book><ID>1</ID><Name>Book1</Name><Price>24.??</Price><Country>US</Country></Book></li>
</ul>
<h2>Books with empty element</h2>
<ul>
<li><Book><ID>1</ID><Name></Name><Price>24.69</Price></Book></li>
</ul>
<h2>Books with missing element</h2>
<ul>
<li><Book><ID>1</ID><Name></Name><Price>24.69</Price></Book></li>
</ul>
</body>
</html>
With proper input:
<Books>
<Book>
<ID>1</ID>
<Name>Book1</Name>
<Price>19.50</Price>
<Country>US</Country>
</Book>
<Book>
<ID>2</ID>
<Name>Book2</Name>
<Price>24.69</Price>
<Country>CA</Country>
</Book>
</Books>
Output:
<Books>
<Book>
<ID>1</ID>
<Name>Book1</Name>
<Price>19.50</Price>
<Country>United States</Country>
</Book>
<Book>
<ID>2</ID>
<Name>Book2</Name>
<Price>24.69</Price>
<Expensive>True</Expensive>
<Country>Canada</Country>
</Book>
</Books>
Note: Ussing keys for performance. This is proof of concept. In real life, the XHTML output should be wrapped into an xsl:message
instruction. From http://www.w3.org/TR/xslt#message
The xsl:message instruction sends a message in a way that is dependent on the XSLT processor. The content of the xsl:message instruction is a template. The xsl:message is instantiated by instantiating the content to create an XML fragment. This XML fragment is the content of the message.
NOTE:An XSLT processor might implement xsl:message by popping up an alert box or by writing to a log file.
If the terminate attribute has the value yes, then the XSLT processor should terminate processing after sending the message. The default value is no.
Edit: Compacting code and addressing country map issue.
Edit 2: In real life, with big XML documents and more enterprice tools, the best approach would be to run the transformation with XSLT 2.0 schema-aware processor for validating, or run validation independly with well-know schema validators. If for some reason these choices aren't aviable, don't go with my proof-of-concept answer because having keys for each validation rule make cause a lot of memory use for big documents. The better way for last case, is to add rules to catch validation errors ending transformation with message. As example, this stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:m="map"
exclude-result-prefixes="m">
<xsl:key name="kIDByValue" match="ID" use="."/>
<m:map from="US" to="United States"/>
<m:map from="CA" to="Canada"/>
<xsl:variable name="vCountry" select="document('')/*/m:map"/>
<xsl:template name="location">
<xsl:param name="pSteps" select="ancestor-or-self::*"/>
<xsl:if test="$pSteps">
<xsl:call-template name="location">
<xsl:with-param name="pSteps"
select="$pSteps[position()!=last()]"/>
</xsl:call-template>
<xsl:value-of select="concat('/',
name($pSteps[last()]),
'[',
count($pSteps[last()]/
preceding-sibling::*
[name()=
name($pSteps[last()])])
+1,
']')"/>
</xsl:if>
</xsl:template>
<xsl:template match="ID[not(number()=number() and not(contains(.,'.')))]">
<xsl:message terminate="yes">
<xsl:text>No integer ID at </xsl:text>
<xsl:call-template name="location"/>
</xsl:message>
</xsl:template>
<xsl:template match="Price[not(number()=number() and contains(.,'.'))]">
<xsl:message terminate="yes">
<xsl:text>No float Price at </xsl:text>
<xsl:call-template name="location"/>
</xsl:message>
</xsl:template>
<xsl:template match="Book/*[not(node())]">
<xsl:message terminate="yes">
<xsl:text>Empty element at </xsl:text>
<xsl:call-template name="location"/>
</xsl:message>
</xsl:template>
<xsl:template match="Book[not(ID and Name and Price and Country)]">
<xsl:message terminate="yes">
<xsl:text>Missing element at </xsl:text>
<xsl:call-template name="location"/>
</xsl:message>
</xsl:template>
<xsl:template match="ID[key('kIDByValue',.)[2]]">
<xsl:message terminate="yes">
<xsl:text>Duplicate ID at </xsl:text>
<xsl:call-template name="location"/>
</xsl:message>
</xsl:template>
<!-- Up to here, rules for validation.
From here, rules for transformation -->
<xsl:template match="@*|node()" name="identity">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Country/text()">
<xsl:variable name="vMatch"
select="$vCountry[@from=current()]"/>
<xsl:choose>
<xsl:when test="$vMatch">
<xsl:value-of select="$vMatch/@to"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="Price[. > 20]">
<xsl:call-template name="identity"/>
<Expensive>True</Expensive>
</xsl:template>
</xsl:stylesheet>
With your input, this message stops the transformation:
Duplicate ID ar /Books[1]/Book[1]/ID[1]
With proper input, outputs the same as before.
Upvotes: 3
Reputation: 10927
Here is the RelaxNG schema:
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<start>
<element name="Books">
<zeroOrMore>
<element name="Book">
<element name="ID"><data type="ID"/></element>
<element name="Name"><text/></element>
<element name="Price"><data type="decimal"/></element>
<element name="Country"><data type="NMTOKEN"/></element>
</element>
</zeroOrMore>
</element>
</start>
</grammar>
and this is the XML Schema version. (I think.)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="Books">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="Book"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Book">
<xs:complexType>
<xs:sequence>
<xs:element ref="ID"/>
<xs:element ref="Name"/>
<xs:element ref="Price"/>
<xs:element ref="Country"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="ID" type="xs:ID"/>
<xs:element name="Name" type="xs:string"/>
<xs:element name="Price" type="xs:decimal"/>
<xs:element name="Country" type="xs:NMTOKEN"/>
</xs:schema>
Couple of things to note here:
Running this through xmllint with the original XML document as input (with modified identifiers) gives:
wilfred$ xmllint --noout --relaxng ./books.rng ./books.xml
./books.xml:5: element Price: Relax-NG validity error : Type decimal doesn't allow value '24.??'
./books.xml:5: element Price: Relax-NG validity error : Error validating datatype decimal
./books.xml:5: element Price: Relax-NG validity error : Element Price failed to validate content
./books.xml:8: element Book: Relax-NG validity error : Expecting an element , got nothing
./books.xml fails to validate
Upvotes: 0