Vivek
Vivek

Reputation: 1326

Creating and saving large XMLs in Java

I am working on a java application whose job is to create and save XML (large size). The sample i got is 300 MB XML file.

The app was designed to collect bulk data from the database and save it in XML format. The application because of its heavy IO and memory usage was designed to process MAX 3 such requests parallel.

Now the requirement is to make it process up to 50 such requests parallel. The current app uses XMLbean to create the XML then saves it to the file system. The application is exposed as a web service on weblogic server (it's on a 64 bit OS and Java MAX Heap size id 4 GB).

I need your opinion on:

1) Is there an XML API that works with XSD and can be used to create large XMLs 200-200 MB with minimum overhead ? XMLbean works fine for us, but is there something that can handle it better ?

2) What will be the best and most memory efficient way to save it to file system ? - i am thinking of changing the current writer to bufferedWriter and have it save 1024 bytes to memory before a physical write to disk happens. - Can there be any side effect to increasing it ?

3) If there is no limit on technology choice and server etc - what will be the ideal solution !!!

EDIT 1# The DB access is fast (about 5% of total time). The creating of XML is slow (takes 80%) of time. Saving it takes 15% (but there are a lot of improvements i see i can do so i am not worried about that). - Thanks Luis.

Upvotes: 3

Views: 8511

Answers (2)

mabr
mabr

Reputation: 380

I had a similiar problem. A server was writing data with JDOM in XML files. Over the years this data was getting bigger, and the server was getting slower and the memory used was huge. The reason for this was the following:

The server accumulated the data in big hashtables and list. At the end of a job he created the XML Document with JDOM in memory and than wrotes it to the disk.

I changed the XML writing to use a stream approach with a XMLStreamWriter The only problem was, that the written xml file was not very pretty. This could be solved with a IndentingXMLStreamWriter

A code example would be:

FileOutputStream fileOutputStream = new FileOutputStream(outXmlFile);
XMLStreamWriter defaultWriter = factory.createXMLStreamWriter(fileOutputStream, encoding);
IndentingXMLStreamWriter writer = new IndentingXMLStreamWriter(defaultWriter);
writer.setIndentStep("  ");
try
{
    writer.writeStartDocument(encoding, "1.0");

    if (stylesheet != null)
    {
        writer.writeProcessingInstruction("xml-stylesheet", "type='text/xsl' href='" + stylesheet + "'");
        writer.writeCharacters("\n");
    }


    writer.writeStartElement(TAG_ROOT);
    writer.writeAttribute(TAG_OBJECT_TYPE, rootObject.getClass().getSimpleName());

    ...

    writer.writeEndElement();
    writer.writeEndDocument();
} 
finally
{
    writer.flush();
    writer.close();
    fileOutputStream.close();
}

Upvotes: 5

beny23
beny23

Reputation: 35018

I would look into using the streaming XML APIs such as StAX to avoid having to hold the whole XML document in memory before writing it out to disk. That way the memory footprint can be kept low (not needed 50x the size of the XML to process 50 documents in parallel)...

See Why StAX? (Oracle) for more info.

Upvotes: 3

Related Questions