Anoop
Anoop

Reputation: 83

Removing a specified set of empty tags from in XML using Java or XSLT

I have a requirement to remove a specified set of tags from an XML if they are empty.

For e.g.:

<xml><tag1>value<tag1><tag2></tag2><tag3>value<tag3><tag4/><tag5/><xml>

In this, the tags to remove (if they are empty) are :

tag2, tag4

Expected Result :

<xml><tag1>value<tag1><tag3>value<tag3><tag5/><xml>

What is the best way to achieve this using plain Java or XSLT? Other than this, do we have a 3rd party library which can be used for the same thing?

Regards, Anoop

Upvotes: 0

Views: 87

Answers (2)

uL1
uL1

Reputation: 2167

tags from an XML if they are empty.

What is empty? There are different possible definitions of "empty":

  1. no childs
  2. no text
  3. no whitespace textnodes (e.g. ' ', CR, NL, #x20, #x9, #xD or #xA.)
  4. combinations of above

Test-Study Input:

<root>
    <tag1>value</tag1>
    <tag2></tag2>
    <tag3><tag3_1/></tag3>
    <tag4><tag4_1/> </tag4>
    <tag5> </tag5>
    <tag6/>
    <tag7>

    </tag7>
</root>

Test-Study Transformation:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text"/>

    <xsl:template match="root">

        <!-- no childs (element nodes) -->
        <xsl:text>"*[not(*)]" matches: </xsl:text>
        <xsl:for-each select="*[not(*)]">
            <xsl:value-of select="name()"/><xsl:text> </xsl:text>
        </xsl:for-each>
        <xsl:text>&#10;</xsl:text>

        <!-- see function node() in thread -->
        <xsl:text>"*[not(node())]" matches: </xsl:text>
        <xsl:for-each select="*[not(node())]">
            <xsl:value-of select="name()"/><xsl:text> </xsl:text>
        </xsl:for-each>
        <xsl:text>&#10;</xsl:text>

        <!-- no textnodes -->
        <xsl:text>"*[not(text())]" matches: </xsl:text>
        <xsl:for-each select="*[not(text())]">
            <xsl:value-of select="name()"/><xsl:text> </xsl:text>
        </xsl:for-each>
        <xsl:text>&#10;</xsl:text>

        <!-- no textnodes reduced by whitespaces -->
        <xsl:text>"*[not(normalize-space(.))]" matches: </xsl:text>
        <xsl:for-each select="*[not(normalize-space(.))]">
            <xsl:value-of select="name()"/><xsl:text> </xsl:text>
        </xsl:for-each>
        <xsl:text>&#10;</xsl:text>

        <!-- combination -->
        <xsl:text>"*[not(normalize-space(.)) and not(*)]" matches: </xsl:text>
        <xsl:for-each select="*[not(normalize-space(.)) and not(*)]">
            <xsl:value-of select="name()"/><xsl:text> </xsl:text>
        </xsl:for-each>
        <xsl:text>&#10;</xsl:text>

    </xsl:template>
</xsl:stylesheet>

Output:

"*[not(*)]" matches: tag1 tag2 tag5 tag6 tag7 
"*[not(node())]" matches: tag2 tag6 
"*[not(text())]" matches: tag2 tag3 tag6 
"*[not(normalize-space(.))]" matches: tag2 tag3 tag4 tag5 tag6 tag7 
"*[not(normalize-space(.)) and not(*)]" matches: tag2 tag5 tag6 tag7 

Function node() matches any node type that can be selected via the child:: axis:

  • element
  • text-node
  • processing-instruction (PI) node
  • comment node.

Upvotes: 1

michael.hor257k
michael.hor257k

Reputation: 116992

the tags to remove (if they are empty) are : tag2, tag4

This is rather trivial to do in XSLT:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="tag2[not(node())] | tag4[not(node())]"/>

</xsl:stylesheet>

Upvotes: 0

Related Questions