Ian
Ian

Reputation: 137

Trying to get XML nodes with XSLT based on an XSD file

I am looking at a way to validate then convert a simple XML file so I can extend it to work on a much more complicated file.

I started off learning some XSD to be able to validate an XML document that I have. Once I managed to get the XML document validated I started learning a bit about XSLT, as I want to extract certain data from the XML.

I have stripped down my experiments to a simpler XML file as follows

message.xml

<?xml version="1.0" encoding="utf-8"?>
<message xsi:noNamespaceSchemaLocation="message.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <UNA>UNA</UNA>
  <UNB>UNB</UNB>
  <UNH>UNH</UNH>
  <BGM>BGM</BGM>
  <DTM>DTM 1</DTM>
  <DTM>DTM 2</DTM>
  <DTM>DTM 3</DTM>
  <NAD>NAD</NAD>
  <DTM>DTM 4</DTM>
  <NAD>NAD</NAD>
  <DTM>DTM 5</DTM>
  <UNT>UNT</UNT>
  <UNZ>UNZ</UNZ>
</message>

I have validated that it is correct by using the following XSD file.

message.xsd

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="message">
        <xs:complexType>
            <xs:sequence>
                <xs:element maxOccurs="1" minOccurs="0" name="UNA" type="xs:string" ></xs:element>
                <xs:element maxOccurs="1" minOccurs="1" name="UNB" type="xs:string" />
                <xs:element maxOccurs="1" minOccurs="1" name="UNH" type="xs:string" />
                <xs:element maxOccurs="1" minOccurs="1" name="BGM" type="xs:string" />
                <xs:element maxOccurs="10" minOccurs="1" name="DTM" type="xs:string" />
                <xs:element maxOccurs="5" minOccurs="0" name="FTX" type="xs:string" />
                <xs:group maxOccurs="99" minOccurs="0" ref="SG2" />
                <xs:element name="UNT" type="xs:string" minOccurs="1" maxOccurs="1"> </xs:element>
                <xs:element name="UNZ" type="xs:string" minOccurs="1" maxOccurs="1"></xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
    <xs:group name="SG2">
        <xs:sequence>
            <xs:element maxOccurs="1" minOccurs="1" name="NAD" type="xs:string" />
            <xs:element name="DTM" type="xs:string" minOccurs="1" maxOccurs="1"></xs:element>
        </xs:sequence>
    </xs:group>



</xs:schema>

The key thing to take from this XSD file is that the DTM node can repeat up to 10 times, and there is a group (that contains a NAD and DTM node but is independent)

I am interested in extracting some of the nodes and so far have come up with this XSLT.

message.xlst

<?xml version="1.0" encoding="UTF-8" ?>

<!-- New document created with EditiX at Wed Oct 07 09:25:50 BST 2015 -->

<xsl:stylesheet version="2.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:fn="http://www.w3.org/2005/xpath-functions"
    xmlns:xdt="http://www.w3.org/2005/xpath-datatypes"
    xmlns:err="http://www.w3.org/2005/xqt-errors"
    exclude-result-prefixes="xs xdt err fn">

    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="/message">
        ,
        <xsl:value-of select="UNA"></xsl:value-of>,
        <xsl:value-of select="UNB"></xsl:value-of>,
        <xsl:value-of select="UNH"></xsl:value-of>,
        <xsl:value-of select="BGM"></xsl:value-of>,
        <xsl:value-of select="DTM"></xsl:value-of>,
        <xsl:value-of select="UNT"></xsl:value-of>,
        <xsl:value-of select="UNZ"></xsl:value-of>,

    </xsl:template>

</xsl:stylesheet>

I am currently interested in how to extract the three DTM elements (but according to the XSD, it could be up to 10 elements) from the XML (but not the fourth/fifth as they are part of a group which I am not interested in.

The above XSLT file outputs all the DTM elements which is not what I want at all.

, UNA, UNB, UNH, BGM, DTM 1 DTM 2 DTM 3 DTM 4 DTM 5, UNT, UNZ,

What I want is

, UNA, UNB, UNH, BGM, DTM 1 DTM 2 DTM 3, UNT, UNZ,

If I wanted the DTM 4 and DTM 5 elements, I would expect to select the Group name SG2 (as outlined in the XSD file)

Upvotes: 1

Views: 1098

Answers (2)

Michael Kay
Michael Kay

Reputation: 163342

Even when you use schema-aware XSLT, the schema-derived group structure is not accessible to the XSLT programmer (I don't think it's even part of the PSVI, which is the augmented XML produced by the schema processor to capture the results of the validation).

One approach would be to redesign the XML to make the structure more explicit (e.g. by enclosing each DTM/NAD group in an enclosing element).

If you can't do that, you'll need to select the DTM elements you want contextually, which is what @MartinHonnen is proposing. If there's another input for which his approach doesn't work, then please show it.

Upvotes: 1

Martin Honnen
Martin Honnen

Reputation: 167571

XSLT 2.0 exists in a schema-aware and in a non schema-aware version, I am not very familiar with details of schema-aware XSLT 2.0 but I don't think it allows you to distinguish in a path expression whether an instance element results from a referenced grouped or an inline definition.

The only difference you could make in XPath is changing <xsl:value-of select="DTM"></xsl:value-of> to <xsl:value-of select="DTM[not(preceding-sibling::NAD)]"/>, I think the required occurrence of the NAD in the group is the only step that makes your schema unambiguous in terms of the different DTM elements, so with XPath we can assume that those DTM elements preceded by a NAD element are part of the group you don't want to output.

Upvotes: 1

Related Questions