Reputation: 3
I'm trying to teach myself XSL and XPATH. I have a sample XML document created by one of our commercial tools, and I want to extract certain node values and create a CSV file as output. A truncated example of the XML document is here:
<?xml version="1.0" encoding="windows-1252"?>
<xml_report>
<form id= "WOI:WorkOrder" xmlns="http://www.w3.org/2000/xforms">
<mode l>
< group name="field-info" minOccurs="1" maxOccurs="1">
<group name="field" minOccurs="1" maxOccurs="*">
<string name="name" />
<number name="id" long="true" />
<string name="type" range="closed">
<value>CHAR</value>
<value>TIME</value>
<value>DECIMAL</value>
<value>REAL</value>
<value>INT</value>
<value>ENUM</value>
<value>ATTACH</value>
<value>DIARY</value>
<value>TIMEOFDAY</value>
<value>DATE</value>
<value>CURRENCY</value>
<value>NULL</value>
</string>
</group>
<!-- Additional group nodes -->
</group>
</model>
<instance>
<field-info>
<field>
<name>Work Order ID*+</name>
<id>1000000182</id>
<type> CHAR</type>
</field>
<!-- Additional field nodes -->
</field-info>
<entry>
<field_value>
<value>WO0000000498983</value>
</field_value>
<field_value>
<value>New Host name for new server build</value>
</field_value>
</entry>
<!-- Additional entry nodes -->
</instance>
</form>
</xml_report>
I want to extract the contents of the value elements only, filtering out everything else. I've written some pretty unsophisticated XSL to attempt to do this:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" omit-xml-declaration="yes" indent="yes" encoding="utf-8" media-type="text/plain" />
<xsl:template match="/xml_report/form/instance">
<xsl:for-each select="entry/field_value">
<xsl:value-of select='value' /><xsl:text>,</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Given the example XML, I would expect the following output:
WO0000000498983,New Host name for new server build,
The issue is that I'm actually extracting the value of ALL elements preceding the node list I actually want to work with, as well as unwanted indents and line spacing. I thought that specifying a restrictive XPATH expression in the template match and for-each tags would suffice, but it does not. How can I narrow the range of selected nodes to only those that I actually want to use? I'm using SAXON as the XSLT processing engine on Windows 7 if that helps.
CHAR
TIME
DECIMAL
REAL
INT
ENUM
ATTACH
DIARY
TIMEOFDAY
DATE
CURRENCY
NULL
Work Order ID*+
1000000182
CHAR
WO0000000498983
New Host name for new server build
Upvotes: 0
Views: 501
Reputation: 11416
You do not get the desired output because of the namespace in your input XML at the form element:
<form id="WOI:WorkOrder" xmlns="http://www.w3.org/2000/xforms">
Therefore all elements in this form have this namespace that is not matched in the XSLT.
When adding the namespace, for example as xmlns:xforms="http://www.w3.org/2000/xforms"
, following XSLT
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xforms="http://www.w3.org/2000/xforms">
<xsl:output method="text" omit-xml-declaration="yes"
indent="yes" encoding="utf-8" media-type="text/plain" />
<xsl:template match="/xml_report">
<xsl:copy>
<xsl:apply-templates select="xforms:form"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/xml_report">
<xsl:apply-templates select="xforms:form/xforms:instance"/>
</xsl:template>
<xsl:template match="xforms:instance">
<xsl:for-each select="xforms:entry/xforms:field_value">
<xsl:value-of select='xforms:value' /><xsl:text>,</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
when applied to your example XML with the correction of <model>
instead of <mode l>
in line 4, produces following output:
WO0000000498983,New Host name for new server build,
To avoid misunderstandings - in this XSLT I've just added the xforms-namespace as xmlns:xforms
, it is not necessary to name it like that. It would e.g. be possible to declare it as xmlns:xfo="http://www.w3.org/2000/xforms"
and then change <xsl:apply-templates select="xforms:form"/>
into <xsl:apply-templates select="xfo:form"/>
(and also change it for the other elements currently prefixed with xforms:
).
As you are using XSLT 2.0, it would also be possible to declare the xforms
namespace as the xpath-default-namespace
, as you're only targeting elements that are in this namespace. The adjusted XSLT
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.w3.org/2000/xforms">
<xsl:output method="text" omit-xml-declaration="yes"
indent="yes" encoding="utf-8" media-type="text/plain" />
<xsl:template match="//form">
<xsl:apply-templates select="instance"/>
</xsl:template>
<xsl:template match="instance">
<xsl:for-each select="entry/field_value">
<xsl:value-of select='value' /><xsl:text>,</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
produces the same output. Because xforms
is the default namespace
, it's not necessary to add the extra namespace and prefix the elements.
Another adjustment in this version is to match the form
instead of the xml_report
, as the xml_report
does not have the xforms
namespace.
As reference for namespaces you can e.g. have a look at http://www.w3.org/TR/REC-xml-names/#ns-decl or valuable answers given at What does "xmlns" in XML mean?
Upvotes: 1