Jerry
Jerry

Reputation: 27

Removing Nodes completely when a specific child element is found

I need to remove the node/nodes completely when a specific child element is found in nodes of XML For Instance, My XML is as follows:

<?xml version="1.0"?>
<booklist>
 <book>
     <name>THEORY OF DYNAMICS</name>
     <author>JOHN</author>
     <price>09786</price>
 </book>
 <book>
     <name>ABCD</name>
     <author>STACEY</author>
     <price>765</price>
 </book>
 <book>
     <name>ABCD</name>
     <author>BTYSON</author>
     <price>34974</price>
 </book>
 <book>
     <name>ABCD</name>
     <author>CTYSON</author>
     <price>09534</price>
 </book>
 <book>
     <name>INTRODUCING JAVA</name>
     <author>CHARLES</author>
     <price>1234</price>
 </book>
 <book>
     <name>ABCD</name>
     <author>TYSON</author>
     <price>34534</price>
 </book>

So,When i search for book tag ='ABCD' my result should be as follows:

OUTPUT XML:

<?xml version="1.0"?>
<booklist>
 <book>
     <name>THEORY OF DYNAMICS</name>
     <author>JOHN</author>
     <price>09786</price>
 </book>
  <book>
     <name>INTRODUCING JAVA</name>
     <author>CHARLES</author>
     <price>1234</price>
 </book>

The code which i tried is as follows:

 try {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder docBuilder = factory.newDocumentBuilder();
        Document doc = docBuilder.parse(new File(FILENAME));
        NodeList list = doc.getElementsByTagName("*");
        for (int i = 0; i <list.getLength(); i++) {

            Node node = (Node) list.item(i);
            // Searching through entire file
            if (node.getNodeName().equalsIgnoreCase("book")) {
                NodeList childList = node.getChildNodes();
                // Looking thhrough all children nodes
                for (int x = 0; x < childList.getLength(); x++) {
                    Node child = (Node) childList.item(x);
                    // To search only "book" children
                    if (child.getNodeType() == Node.ELEMENT_NODE &&  
                  child.getNodeName().equalsIgnoreCase("name") && 
          child.getTextContent().toUpperCase().equalsIgnoreCase("abcd".toUpperCase())) {
                        // Delete node here
                        node.getParentNode().removeChild(node);
                    }
                }
            }
        }
        try {
            TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        DOMParser parser = new DOMParser();
        parser.parse(FILENAME);                       
        DOMSource source = new DOMSource(doc);
        StreamResult result = new StreamResult(new File(NEWFILE));
        transformer.transform(source, result);
        } catch (IOException io) {
            io.printStackTrace();

        }
    } catch (ParserConfigurationException pce) {
        pce.printStackTrace();
    } catch (IOException ioe) {
        ioe.printStackTrace();
    } catch (SAXException saxe) {
        saxe.printStackTrace();
    }

I'm unable to delete all the book nodes which has child element as "abcd" , Instead am able to delete only few alternative book nodes which has child element as "abcd". Can you suggest me what is the mistake in my code? Why am i unable to delete all the book nodes whose name='abcd'?

Upvotes: 1

Views: 4949

Answers (1)

Sirko
Sirko

Reputation: 74036

The DOM spec says, that

NodeList and NamedNodeMap objects in the DOM are live; that is, changes to the underlying document structure are reflected in all relevant NodeList and NamedNodeMap objects. For example, if a DOM user gets a NodeList object containing the children of an Element, then subsequently adds more children to that element (or removes children, or modifies them), those changes are automatically reflected in the NodeList, without further action on the user's part.

So while you traverse the NodeList list and remove nodes from it, these changes are immediately reflected in the NodeList. Hence the indexing inside the NodeList changes and you never traverse all elements.

One solution to this would be to first collect all nodes, that you want deleted, and afterwards delete them in a separate loop:

// ...

Document doc = docBuilder.parse(new File(FILENAME));
NodeList list = doc.getElementsByTagName("book");

// XXX collection of nodes to delete XXX
List<Node> delete = new ArrayList<Node>();

for (int i = 0; i <list.getLength(); i++) {

    Node node = list.item(i);
    NodeList childList = node.getChildNodes();

    // Looking through all children nodes
    for (int x = 0; x < childList.getLength(); x++) {

        Node child = childList.item(x);

        // To search only "book" children
        if (child.getNodeType() == Node.ELEMENT_NODE &&  
            child.getNodeName().equalsIgnoreCase("name") && 
            child.getTextContent().toUpperCase().equalsIgnoreCase("abcd".toUpperCase())) {
          // XXX just add to "to be deleted" list XXX
          delete.add( node );
          break;
        }
    }

}

// XXX delete nodes XXX
for( int i=0; i<delete.size(); i++ ) {
  Node node = delete.get( i );
  node.getParentNode().removeChild( node );
}

// ...

Alternatively you could just traverse the list backwards, starting at list.getLength() going down to 0.


I changed another thing: In your code you traverse all nodes in the document and then manually filter for the <book> nodes. I think it would be better to select just the <book> nodes in the first place using

NodeList list = doc.getElementsByTagName("book");

instead of

NodeList list = doc.getElementsByTagName("*");

Upvotes: 3

Related Questions