achraf
achraf

Reputation: 69

Remove empty XML elements in Java

i have an output XML which contains empty elements and also empty elements but with attributes.

i checked some older post which help me to solve a part of my problem.

i used the following XSLT solution

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">   <xsl:template match="@*|node()">
    <xsl:if test=". != '' or ./@* != ''">
      <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
    </xsl:if>   </xsl:template> </xsl:stylesheet>

but the problem is that remove also elements having child elements with attributes like

<CurrencyList> 
<Currency code="EURO"/> 
<Currency code="USD"/>
</CurrencyList>

anyone have an idea how to solve this problem ?

Many thanks

Upvotes: 1

Views: 6506

Answers (2)

Joop Eggen
Joop Eggen

Reputation: 109547

It is like deleting empty directories: you have to do a depth-first recursive walk: if all subdirectories are deleted, then one can consider deleting the current directory.

As a consequence deleting can best be done in Java with recursion. The advantage is, that one does not need a copy.


Code

On request, as working with the XML API is quite fragmentary, some untested code:

import java.io.*;
import java.util.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.*;
import org.xml.sax.SAXException;

public class XmlCleanup {

    public static void main(String[] args) {
        if (args.length == 0) {
            args = new String[] { "/home/joop/Labortablo/test1.xml" };
        }
        new XmlCleanup().process(args[0]);
    }

    public void process(String xmlPath) {
        try {
            // Read XML document:
            DocumentBuilder builder =
                    DocumentBuilderFactory.newInstance().newDocumentBuilder();
            Document doc = builder.parse(new File(xmlPath));

            removeEmptyChildElements(doc.getDocumentElement());

            // Write XML document back:
            TransformerFactory transformerFactory = TransformerFactory.newInstance();
            Transformer transformer = transformerFactory.newTransformer();
            DOMSource source = new DOMSource(doc);
            StreamResult result = new StreamResult(new File(xmlPath
                    .replaceFirst("\\.xml$", "") + "-clean.xml"));
            transformer.transform(source, result);
        } catch (TransformerException ex) {
            Logger.getLogger(XmlCleanup.class.getName()).log(Level.SEVERE, null, ex);
        } catch (SAXException ex) {
            Logger.getLogger(XmlCleanup.class.getName()).log(Level.SEVERE, null, ex);
        } catch (IOException ex) {
            Logger.getLogger(XmlCleanup.class.getName()).log(Level.SEVERE, null, ex);
        } catch (ParserConfigurationException ex) {
            Logger.getLogger(XmlCleanup.class.getName()).log(Level.SEVERE, null, ex);
        }
    }

    private void removeEmptyChildElements(Element parentElement) {
        List<Element> toRemove = new LinkedList<Element>();

        NodeList children = parentElement.getChildNodes();
        int childrenCount = children.getLength();
        for (int i = 0; i < childrenCount; ++i) {
            Node child = children.item(i);
            if (child.getNodeType() == Node.ELEMENT_NODE) {
                Element childElement = (Element) child;
                removeEmptyChildElements(childElement);
                if (elementIsRedundant(childElement)) {
                    toRemove.add(childElement);
                }
            }
        }

        for (Element childElement: toRemove) {
            parentElement.removeChild(childElement);
        }
        parentElement.normalize();
    }

    private boolean elementIsRedundant(Element element) {
        if (element.hasAttributes())
            return false;
        if (!element.hasChildNodes())
            return true;
        NodeList children = element.getChildNodes();
        int childrenCount = children.getLength();
        for (int i = 0; i < childrenCount; ++i) {
            Node child = children.item(i);
            String value = child.getNodeValue();
            if (value != null && !value.matches("\\s*")) {
                return false; // Found non-whitespace text
            }
        }
        return true;
    }
}

It uses java.xml.transform so you may use a XSLT transformation too; a bit simpler would be to use javax.xml.stream.XMLOutputFactory.

Upvotes: 1

J&#246;rn Horstmann
J&#246;rn Horstmann

Reputation: 34014

I think you are on the right track by starting with the identity transform. I would suggest to keep the identity template as is and then add a more specific template that ignores empty elements.

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()" />
    </xsl:copy>
</xsl:template>

The call to ´normalize-space` makes this template collapse all consecutive whitespace, which is usually only used for indenting. The second part of the match then excludes all elements that either have attributes themselves or have descendants with attributes. For debugging purposes I let the template create a comment in the output whenever an element gets removed.

<xsl:template match="*[normalize-space(.) = '' and not(descendant-or-self::*/@*)]">
    <xsl:comment><xsl:value-of select="name(.)" /></xsl:comment>
</xsl:template>

Upvotes: 0

Related Questions