Jing
Jing

Reputation:

How to find out an element node with a empty string value in xsl

I am working on transforming a xml file from old version to new version. Here is the basic template which i am using:

<xsl:template match="*">
    <xsl:element name="{name(.)}" namespace="{namespace-uri(.)}">
      <xsl:copy-of select="@*"></xsl:copy-of>
      <xsl:apply-templates></xsl:apply-templates>
    </xsl:element>
</xsl:template>

However, new version of xml schema requires that all elements which has a text value should not be empty string. So old xml document such as:

<dataset>
 <title> </title>
</dataset>

will be invalid in the new version. I tried to modify default template for text node. The new text template will check the text node if the text code is empty string, it will terminate the transformation, otherwise it will copy the value to the output xml. Here is the template:

<xsl:template match="text()">
    <xsl:variable name="text-value" select="."/>
      <xsl:if test="normalize-space($text-value) = ''">
          <xsl:message terminate="yes">
                <xsl:call-template name="output_message3_fail">
                  <xsl:with-param name="parent_node" select="name(parent::node())"/>
                </xsl:call-template>
          </xsl:message>
      </xsl:if>
      <xsl:value-of select="$text-value"/>
</xsl:template>

However, i found out if input looks like:

<dataset>
 <title>My tile</title>
</dataset

the new text template will be called. If input looks like:

<dataset>
 <title> </title>
</dataset>

the new text template will never be called and output will looks like

<dataset>
     <title/>
</dataset>

So my approach - modifying the text template, doesn't work. Do you have any suggestion how to do this - if find an element with empty string, terminate the transformation.

Thank you very much!

By the way, i am using java xalan xslt processor.

Upvotes: 2

Views: 2428

Answers (3)

paulmurray
paulmurray

Reputation: 3413

Maybe the test should be something like

length(text())!=0 && length(strip-whitespace(text())) == 0

Doesn't xslt support regular expressions? If so, then that would be the way to go.

But does he want that every element must contain some nonspace text? Or are there some elements that must contain at least something and other elements where

<foo bar="BAR"/>

is ok? I'll bet anything it is. I think that it is likely that he is going to have to write ules on a case-by-case basis for those elements that must be non-empty.

Which leads me to my final comment: the correct technology for checking the validity of an XML document is an XML schema.

Upvotes: 0

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243529

However, i found out if input looks like:

<dataset>
  <title>My tile</title>
</dataset>

the new text template will be called

Yes, this is exactly what the provided code should be doing -- I will explain this in a moment.

If input looks like:

<dataset>
  <title> </title>
</dataset>

the new text template will never be called and output will looks like

<dataset>
  <title/>
</dataset>

I couldn't reproduce this with Xalan (J or c) and many other XSLT processors I have (Saxon 6.5.3, .NET XslCompiledTransform and XslTransform, Msxml3,4, 6, JD,... etc). All of them display an error message (inside <xsl:message terminate="yes">)

The only XSLT processor that produces the above output is AltovaXML (XmlSPY).

If you are using XmlSPY, probably you could consider either trying to use another XSLT processor or contacting Altova for assistance.

Now, back to the first behavior.

Explanation:

The provided source XML file:

<dataset>
  <title>My tile</title>
</dataset>

has three text nodes:

  1. The first text node is the one between <dataset> and <title> and it contains only whitespace.

  2. The second text node is the only child of <title> and its value is the string "My tile".

  3. The third and last text node is between </title> and </dataset> and consists of only whitespace.

When the template matching text() is selected for processing the first of the above three text nodes, the test is positive and <xsl:message terminate="yes"> is executed -- and this is exactly the reported behavior.

Solution:

A simple solution exists. Just change the template matching text() to match only such text nodes that are the only text node of their parent. Now the XSLT transformation behaves as expected for the both types of XML documents that were originally provided:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output omit-xml-declaration="yes" indent="yes"/>

  <xsl:template match="*">
    <xsl:element name="{name(.)}" namespace="{namespace-uri(.)}">
      <xsl:copy-of select="@*"/>
      <xsl:apply-templates/>
    </xsl:element>
  </xsl:template>

  <xsl:template match=
    "*[not(node()[2])]/text()
              [normalize-space()='']">
    <xsl:message terminate="yes">
      <xsl:call-template name="output_message3_fail">
        <xsl:with-param name="parent_node" select="name(..)"/>
      </xsl:call-template>
    </xsl:message>
  </xsl:template>

  <xsl:template name="output_message3_fail">
    <xsl:param name="parent_node"/>

    <xsl:message>        ERROR:        
      &lt;<xsl:copy-of select="$parent_node"/>> is empty
    </xsl:message>
  </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<dataset>
  <title>My tile</title>
</dataset>

The wanted result is produced:

<dataset>
   <title>My tile</title>
</dataset>

When it is applie on the second XML document:

<dataset>
    <title> </title>
</dataset>

the correct result is produced:

ERROR:        
        <title> is empty

Upvotes: 2

Robert Rossney
Robert Rossney

Reputation: 96830

I'm not clear on what it is you really want. You say you don't want to emit elements that contain the empty string, and then give as an example this:

<dataset>
   <title> </title>
</dataset>

in which the title element doesn't contain the empty string. It contains whitespace. So I'm going to assume that by "empty string" you mean "whitespace only."

Using xsl:strip-space will eliminate whitespace-only text nodes from the source tree before processing it. If you genuinely want to abort the transform with an exception if you encounter an element containing whitespace, you can't use xsl:strip-space, as it will remove all of the exception-triggering conditions before the transform runs.

I think what you want to do instead is write a template that looks like this:

<xsl:template match="*[not(*) and text() and not(normalize-space(text()) != '')]">
   ...

This template will match any element for which the following is true:

  • it doesn't have child elements
  • it does contain at least one text node
  • all of the text nodes it contains are whitespace-only

So in your example, it won't match the dataset element (because it has a child element), but it will match the title element. It wouldn't match <title/> or <title></title>, though, because neither of those elements contains text nodes.

Upvotes: 0

Related Questions