satish marathe
satish marathe

Reputation: 1133

reading and updating a large xml file in java

I have an XML file about 400 MB I need to find a specific element and then reformat its date attribute from mm-dd-yyyy to dd-mm-yyyy Here is the code that I am using

    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(inputXML);
doc.getDocumentElement().normalize();
//format the date 
    NodeList nodes = doc.getElementsByTagName("empDetails");
    for (int i = 0; i < nodes.getLength(); i++){
    String oldDate =nodes.item(i).getAttributes().getNamedItem("doj").getNodeValue();
    String newValue = //formatted to dd-mm-yyyy 
nodes.item(i).getAttributes().getNamedItem("doj").setTextContent(newValue);
}

    //now write back to file 
    // write the content into xml file
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer;        
transformer = transformerFactory.newTransformer();      
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File(fileName));     
transformer.transform(source, result);      

However this is throwing out of memory On windows 32 bit - it fails

So I tried this on a unix box and set the memory to : java -Xmx3072m -classpath . MyTest

It did run for some time but failed again

Question - is it possible to be handling a file of 400 MB where I want to selectivey update and save? ( am sure the answer is yes ) Is my code bad - anything that I should change ? ( no unix shell scripts as an alternate solution please - my intent is to use java ) should I be bumping up the heap size further ? Thanks, satish

Upvotes: 1

Views: 1698

Answers (1)

Dev
Dev

Reputation: 12196

It would probably be better to use the StAX api read the document like a stream while writing out (again using StAX) the parts you don't want to change immediately to a a temporary file. When you get to a part you are interested in, change the values before feeding it back to the temporary file. When you are done you can rename the temporary file over the old one.

I'd recommend the XMLEventReader and XMLEventWriter. XMLEvents you don't care about you can pass directly through from reader to writer. This will only keep small parts of the document you are working on in memory.

XMLEventReader reader = ...;
XMLEventWriter writer = ...;
XMLEvent cursor;

while(reader.hasNext()){
  cursor = reader.nextEvent();
  if(doICareAboutThisEvent(cursor)){
      writer.add(changeEvent(cursor));
  }else{
      writer.add(cursor);
  }

}

Obviously the implementation can be more complicated and your decisions about which elements to care about and edit can be more complicated than the state of a single element. This is just a very simple example.

Upvotes: 2

Related Questions