Mike
Mike

Reputation: 15

Split an XML file into multiple files

Suppose I have a following XML file:

<a>
  <b>
   ....
  </b>
  <b>
   ....
  </b>
  <b>
   ....
  </b>
</a>

I want split this file into multiple XML files based on the number of <b> tags.

Like:

File01.xml

<a>
  <b>
   ....
  </b>
</a>

File02.xml

<a>
  <b>
   ....
  </b>
</a>

File03.xml

<a>
  <b>
   ....
  </b>
</a>

And so on...

I'm new to Groovy and I tried with the following piece of code.

import java.util.HashMap
import java.util.List
import javax.xml.parsers.DocumentBuilderFactory
import org.custommonkey.xmlunit.*
import org.w3c.dom.NodeList
import javax.xml.xpath.*
import javax.xml.transform.TransformerFactory
import org.w3c.dom.*
import javax.xml.transform.dom.DOMSource
import javax.xml.transform.stream.StreamResult

class file_split {   

        File input = new File("C:\\file\\input.xml")
        def dbf  = DocumentBuilderFactory.newInstance().newDocumentBuilder()
        def doc = new XmlSlurper(dbf).parse(ClassLoader.getSystemResourceAsStream(input));
        def xpath = XPathFactory.newInstance().newXPath()

        NodeList nodes = (NodeList) xpath.evaluate("//a/b", doc, XPathConstants.NODESET)

        def itemsPerFile = 5;
        def fileNumber = 0;

        def currentdoc = dbf.newDocument()
        def rootNode = currentdoc.createElement("a")
        def currentFile = new File(fileNumber + ".xml")

        try{
            for(i = 1; i <= nodes.getLength(); i++){
                def imported = currentdoc.importNode(nodes.item(i-1), true)
                rootNode.appendChild(imported)

                if(i % itemsPerFile == 0){
                    writeToFile(rootNode, currentFile)

                    rootNode = currentdoc.createElement("a");
                    currentFile = new File((++fileNumber)+".xml");
                }
            }
        }
        catch(Exception ex){
            logError(file.name,ex.getMessage());
            ex.printStackTrace();
        }

    def writeToFile(Node node, File file) throws Exception {
        def transformer = TransformerFactory.newInstance().newTransformer();
        transformer.transform(new DOMSource(node), new StreamResult(new FileWriter(file)));
    }
}

Any help would be greatly appreciated.

Upvotes: 0

Views: 2374

Answers (2)

tim_yates
tim_yates

Reputation: 171074

This should work:

import groovy.xml.*

new XmlSlurper().parseText( file ).b.eachWithIndex { element, index ->
    new File( "/tmp/File${ "${index+1}".padLeft( 2, '0' ) }.xml" ).withWriter { w ->
        w << XmlUtil.serialize( new StreamingMarkupBuilder().bind {
            a { 
                mkp.yield element
            }
        } )
    }
}

If you want to group them, you can use collate (this example groups 2 b tags per file:

import groovy.xml.*

new XmlSlurper().parseText( file )
                .b
                .toList()
                .collate( 2 )
                .eachWithIndex { elements, index ->
    new File( "/tmp/File${ "${index+1}".padLeft( 2, '0' ) }.txt" ).withWriter { w ->
        w << XmlUtil.serialize( new StreamingMarkupBuilder().bind {
            a {
                elements.each { element ->
                    mkp.yield element
                }
            }
        } )
    }
}

Upvotes: 2

Robby Cornelissen
Robby Cornelissen

Reputation: 97120

I don't know what problem you are experiencing, but it seems like your creating a new rootNode when needed, but not a new currentdoc. Try to reinitialize the currentdoc right before you reinitialize the rootNode in your loop.

Upvotes: 0

Related Questions