Robert Strauch
Robert Strauch

Reputation: 12896

Groovy: Handling large amounts of data with StreamingMarkupBuilder

The scenario is the following. I have a plain text file which contains 2,000,000 lines with an ID. This list of IDs needs to be converted to a simple XML file. The following code works fine as long as there are only some thousand entries in the input file.

def xmlBuilder = new StreamingMarkupBuilder()
def f = new File(inputFile)
def input = f.readLines()
def xmlDoc = {
  Documents {
    input.each {
      Document(myAttribute: it)
    }
  }
}

def xml = xmlBuilder.bind(xmlDoc)
f.write(xml)

If the 2,000,000 entries are processed, I'm getting an OutOfMemoryException for the Java heap (set to 1024M). Is there a way to improve the above code so that it's able to handle large amounts of data?

Cheers, Robert

Upvotes: 2

Views: 1800

Answers (2)

Steven
Steven

Reputation: 3894

here's your problem: def input = f.readLines() ;-)

Upvotes: 0

tim_yates
tim_yates

Reputation: 171084

The issue with that solution is that it is loading everything into memory before writing it out...

This might be a better solution, as I believe it should be writing the data out to the file output.xml as it processes input.txt.

import groovy.xml.MarkupBuilder

new File( 'output.xml' ).withWriter { writer ->
  def builder = new MarkupBuilder( writer )
  builder.Documents {
    new File( 'input.txt' ).eachLine { line ->
      Document( attr: line )
    }
  }
}

Upvotes: 4

Related Questions