Eugene N.
Eugene N.

Reputation: 23

Remove empty elements and descendants based on attributes

I am attempting to create an XSLT that will remove all elements and all descendants of any element that satisfy either of the following conditions:

  1. Element is empty (no text, white-space only) and has no attributes.
  2. Element is empty and has one or more attributes that are all empty (no text, white-space only).

In other words: The only elements that should remain are those that are not empty and those that are empty and have at least one attribute that is not empty.

Additionally, the XSLT must be generic/dynamic enough to process any XML given XML input and produce results according to the above expectations.

I am currently using the following XSLT:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:output method="xml" omit-xml-declaration="no" indent="no" />
    <xsl:strip-space elements="*" />

    <xsl:template match="/">
        <xsl:apply-templates select="*" />
    </xsl:template>

    <xsl:template match="*">
        <xsl:if test="(normalize-space(.)) or (normalize-space(.//@*))">
            <xsl:copy>
                <xsl:element name="name()">
                    <xsl:copy-of select="@*" />
                    <xsl:apply-templates />
                </xsl:element>
            </xsl:copy>
        </xsl:if>
    </xsl:template>

</xsl:stylesheet>

To transform the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<Parents>
    <Parent></Parent>
    <Parent test=""></Parent>
    <Parent test="A"></Parent>
    <Parent>Parent 1</Parent>
    <Parent>
        <Child test=""></Child>
    </Parent>
    <Parent>
        <Child test="A"></Child>
    </Parent>
    <Parent>
        <Child>Child 1</Child>
    </Parent>
    <Parent>
        <Child>
            <GrandChild test=""></GrandChild>
        </Child>
    </Parent>
    <Parent>
        <Child>
            <GrandChild test="A"></GrandChild>
        </Child>
    </Parent>
    <Parent>
        <Child>
            <GrandChild>GrandChild 1</GrandChild>
        </Child>
    </Parent>
    <Parent>
        <Child>
            <GrandChild>
                <GreatGrandChild test=""></GreatGrandChild>
            </GrandChild>
        </Child>
    </Parent>
    <Parent>
        <Child>
            <GrandChild>
                <GreatGrandChild test="A"></GreatGrandChild>
            </GrandChild>
        </Child>
    </Parent>
    <Parent>
        <Child>
            <GrandChild>
                <GreatGrandChild>GreatGrandChild 1</GreatGrandChild>
            </GrandChild>
        </Child>
    </Parent>
</Parents>

I am expecting that the above XSLT will transform the given XML as such:

<?xml version="1.0" encoding="UTF-8"?>
<Parents>
    <Parent test="A"></Parent>
    <Parent>Parent 1</Parent>
    <Parent>
        <Child test="A"></Child>
    </Parent>
    <Parent>
        <Child>Child 1</Child>
    </Parent>
    <Parent>
        <Child>
            <GrandChild test="A"></GrandChild>
        </Child>
    </Parent>
    <Parent>
        <Child>
            <GrandChild>GrandChild 1</GrandChild>
        </Child>
    </Parent>
    <Parent>
        <Child>
            <GrandChild>
                <GreatGrandChild test="A"></GreatGrandChild>
            </GrandChild>
        </Child>
    </Parent>
    <Parent>
        <Child>
            <GrandChild>
                <GreatGrandChild>GreatGrandChild 1</GreatGrandChild>
            </GrandChild>
        </Child>
    </Parent>
</Parents>

However, it is producing the following undesired result after transformation:

<?xml version="1.0" encoding="UTF-8"?>
<Parents>
    <Parent test="A" />
    <Parent>Parent 1</Parent>
    <Parent>
        <Child>Child 1</Child>
    </Parent>
    <Parent>
        <Child>
            <GrandChild>GrandChild 1</GrandChild>
        </Child>
    </Parent>
    <Parent>
        <Child>
            <GrandChild>
                <GreatGrandChild>GreatGrandChild 1</GreatGrandChild>
            </GrandChild>
        </Child>
    </Parent>
</Parents>

At first glance, it appeared as though only top-level nodes and their immediate children were being handled correctly by the "(normalize-space(.)) or (normalize-space(.//@*))" condition of the test expression. However, this block from the input XML

    <Parent>
        <Child test="A"></Child>
    </Parent>

was also filtered out from the output as well.

I have tried a number of variations for this implementation and this has been the closest I have come to reaching my desired result.


I hope I have clearly provided enough details to describe what I am trying to accomplish. I will gladly provide more details upon request if necessary.

Upvotes: 2

Views: 1180

Answers (1)

Abel
Abel

Reputation: 57149

Excellent and clear question, with clear example of what you are after, and good to see you already tried it yourself.

There are few things happening with your code: - the output you showed does not match the output you would get if you run your stylesheet, because: - your are duplicating each element by first shallow-copying it (xsl:copy) and then doing the same by hand (xsl:element with name()) - the stylesheet, before I edited it, was invalid, i.e., name="name()" is illegal, it should have been name="{name()}". - you are testing with xsl:if and normalize-space on the value of all the underlying nodes, which is the concatenation of all their text contents. This is not what you want, you should also test for children, or just test for text-nodes (see below) - you already normalized the space with xsl:strip-space, unless you really also want to ignore whitespace-only attributes (not in your reqs), you should use it, otherwise, there is no need.

This is the actual output you would get with the input XML you provided, which is clearly also not what you want:

<?xml version="1.0" encoding="UTF-8"?>
<Parents>
   <Parents>
      <Parent>
         <Parent test="A"/>
      </Parent>
      <Parent>
         <Parent>Parent 1</Parent>
      </Parent>
      <Parent>
         <Parent>
            <Child>
               <Child test="A"/>
            </Child>
         </Parent>
      </Parent>
      <Parent>
         <Parent>
            <Child>
               <Child>Child 1</Child>
            </Child>
         </Parent>
      </Parent>
      <Parent>
         <Parent>
            <Child>
               <Child>
                  <GrandChild>
                     <GrandChild test="A"/>
                  </GrandChild>
               </Child>
            </Child>
         </Parent>
      </Parent>
      <Parent>
         <Parent>
            <Child>
               <Child>
                  <GrandChild>
                     <GrandChild>GrandChild 1</GrandChild>
                  </GrandChild>
               </Child>
            </Child>
         </Parent>
      </Parent>
      <Parent>
         <Parent>
            <Child>
               <Child>
                  <GrandChild>
                     <GrandChild>
                        <GreatGrandChild>
                           <GreatGrandChild test="A"/>
                        </GreatGrandChild>
                     </GrandChild>
                  </GrandChild>
               </Child>
            </Child>
         </Parent>
      </Parent>
      <Parent>
         <Parent>
            <Child>
               <Child>
                  <GrandChild>
                     <GrandChild>
                        <GreatGrandChild>
                           <GreatGrandChild>GreatGrandChild 1</GreatGrandChild>
                        </GreatGrandChild>
                     </GrandChild>
                  </GrandChild>
               </Child>
            </Child>
         </Parent>
      </Parent>
   </Parents>
</Parents>

Here's a solution you can use, which uses the modified copy idiom and then overrides those elements and other nodes that you do not want to copy. This is the "typical XSLT way" of doing it, and as you can see, there is no need for any xsl:if branching instruction. The processor does the job for you ;).

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:output method="xml" omit-xml-declaration="no" indent="yes" />
    <xsl:strip-space elements="*" />

    <xsl:template match="node() | @*">
        <xsl:copy>
            <xsl:apply-templates select="node() | @*" />
        </xsl:copy>
    </xsl:template>

    <!-- remove empty elements, or empty elements with empty attribs -->
    <xsl:template match="*[not(*)][not(text())][normalize-space(@*) = '']" />

</xsl:stylesheet>

The stylesheet copies evertything (all kinds of nodes, also comments and processing instructions), this is the first template. The second template, which is deliberately empty, just removes everything that fits its filter. This works, because in XSLT, a more specific template has higher precedence than a less specific one.

The output is currently as follows, which does not entirely match your output, but does match your description the way I understood it:

<?xml version="1.0" encoding="UTF-8"?>
<Parents>
   <Parent test="A"/>
   <Parent>Parent 1</Parent>
   <Parent/>
   <Parent>
      <Child test="A"/>
   </Parent>
   <Parent>
      <Child>Child 1</Child>
   </Parent>
   <Parent>
      <Child/>
   </Parent>
   <Parent>
      <Child>
         <GrandChild test="A"/>
      </Child>
   </Parent>
   <Parent>
      <Child>
         <GrandChild>GrandChild 1</GrandChild>
      </Child>
   </Parent>
   <Parent>
      <Child>
         <GrandChild/>
      </Child>
   </Parent>
   <Parent>
      <Child>
         <GrandChild>
            <GreatGrandChild test="A"/>
         </GrandChild>
      </Child>
   </Parent>
   <Parent>
      <Child>
         <GrandChild>
            <GreatGrandChild>GreatGrandChild 1</GreatGrandChild>
         </GrandChild>
      </Child>
   </Parent>
</Parents>

Update: by re-reading your requirement, you do mention descendents. So in retrospect, I think you mean the following:

  • Keep an element if any of its children contains non-whitespace text
  • Keep an element if any of its children contains non-whitespace attribute values
  • Delete the rest

If that is correct, you can achieve that by checking for descendent text nodes not being there and combining all descendent attribute nodes. Replace the filtering line above with the following:

<xsl:template match="*[not(.//text())][normalize-space(.//@*) = '']" />

This will give you exactly the output that you have above in your excpected output (and is closer to what your attempt already showed).

Upvotes: 1

Related Questions