Reputation: 1133
I have an XML file about 400 MB I need to find a specific element and then reformat its date attribute from mm-dd-yyyy to dd-mm-yyyy Here is the code that I am using
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(inputXML);
doc.getDocumentElement().normalize();
//format the date
NodeList nodes = doc.getElementsByTagName("empDetails");
for (int i = 0; i < nodes.getLength(); i++){
String oldDate =nodes.item(i).getAttributes().getNamedItem("doj").getNodeValue();
String newValue = //formatted to dd-mm-yyyy
nodes.item(i).getAttributes().getNamedItem("doj").setTextContent(newValue);
}
//now write back to file
// write the content into xml file
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer;
transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File(fileName));
transformer.transform(source, result);
However this is throwing out of memory On windows 32 bit - it fails
So I tried this on a unix box and set the memory to : java -Xmx3072m -classpath . MyTest
It did run for some time but failed again
Question - is it possible to be handling a file of 400 MB where I want to selectivey update and save? ( am sure the answer is yes ) Is my code bad - anything that I should change ? ( no unix shell scripts as an alternate solution please - my intent is to use java ) should I be bumping up the heap size further ? Thanks, satish
Upvotes: 1
Views: 1698
Reputation: 12196
It would probably be better to use the StAX api read the document like a stream while writing out (again using StAX) the parts you don't want to change immediately to a a temporary file. When you get to a part you are interested in, change the values before feeding it back to the temporary file. When you are done you can rename the temporary file over the old one.
I'd recommend the XMLEventReader
and XMLEventWriter
. XMLEvents
you don't care about you can pass directly through from reader to writer. This will only keep small parts of the document you are working on in memory.
XMLEventReader reader = ...;
XMLEventWriter writer = ...;
XMLEvent cursor;
while(reader.hasNext()){
cursor = reader.nextEvent();
if(doICareAboutThisEvent(cursor)){
writer.add(changeEvent(cursor));
}else{
writer.add(cursor);
}
}
Obviously the implementation can be more complicated and your decisions about which elements to care about and edit can be more complicated than the state of a single element. This is just a very simple example.
Upvotes: 2