Reputation: 1519
I am trying to change a single value in a large (5mb) XML file. I always know the value will be in the first 10 lines, therefore I do not need to read in 99% of the file. Yet it seems doing a partial XML read in Java is quite tricky.
In this picture you can see the single value I need to access.
I have read a lot about XML in Java and the best practices of handling it. However, in this case I am unsure of what the best approach would be - A DOM, STAX or SAX parser all seem to have different best use case scenarios - and I am not sure which would best suit this problem. Since all I need to do is edit one value.
Perhaps, I shouldn't even use an XML parser and just go with regex, but it seem like it is a pretty bad idea to use regex on XML
Hoping someone could point me in the right direction, Thanks!
Upvotes: 3
Views: 349
Reputation: 11440
You can use the StAX parser to write the XML as you read it. While doing this you can replace the content as it parses. Using a StAX parser will only contain parts of the xml in memory at any given time.
public static void main(String [] args) throws Exception {
final String newProjectId = "888";
File inputFile = new File("in.xml");
File outputFile = new File("out.xml");
System.out.println("Reading " + inputFile);
System.out.println("Writing " + outputFile);
XMLInputFactory inFactory = XMLInputFactory.newInstance();
XMLEventReader eventReader = inFactory.createXMLEventReader(new FileInputStream(inputFile));
XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLEventWriter writer = factory.createXMLEventWriter(new FileWriter(outputFile));
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
boolean useExistingEvent; // specifies if we should use the event right from the reader
while (eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
useExistingEvent = true;
// look for our Project element
if(event.getEventType() == XMLEvent.START_ELEMENT) {
// read characters
StartElement elemEvent = event.asStartElement();
Attribute attr = elemEvent.getAttributeByName(QName.valueOf("ObjectID"));
// check to see if this is the project we want
// TODO: put what logic you want here
if("Project".equals(elemEvent.getName().getLocalPart()) && attr != null && attr.getValue().equals("1")) {
Attribute versionAttr = elemEvent.getAttributeByName(QName.valueOf("Version"));
// we need to make a list of new attributes for this element which doesnt include the Version a
List<Attribute> newAttrs = new ArrayList<>(); // new list of attrs
Iterator<Attribute> existingAttrs = elemEvent.getAttributes();
while(existingAttrs.hasNext()) {
Attribute existing = existingAttrs.next();
// copy over everything but version attribute
if(!existing.getName().getLocalPart().equals("Version")) {
newAttrs.add(existing);
}
}
// add our new attribute for projectId
newAttrs.add(eventFactory.createAttribute(versionAttr.getName(), newProjectId));
// were using our own event instead of the existing one
useExistingEvent = false;
writer.add(eventFactory.createStartElement(elemEvent.getName(), newAttrs.iterator(), elemEvent.getNamespaces()));
}
}
// persist the existing event.
if(useExistingEvent) {
writer.add(event);
}
}
writer.close();
}
Upvotes: 2
Reputation: 120704
I would choose DOM over SAX or StAX simply for the (relative) simplicity of the API. Yes, there is some boilerplate code to get the DOM populated, but once you get past that it is fairly straight-forward.
Having said that, if your XML source is 100s or 1000s of megabytes, one of the streaming APIs would be better suited. As it is, 5MB is not what I would consider a large dataset, so go ahead and use DOM and call it a day:
import java.io.File;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
public class ChangeVersion
{
public static void main(String[] args)
throws Exception
{
if (args.length < 3) {
System.err.println("Usage: ChangeVersion <input> <output> <new version>");
System.exit(1);
}
File inputFile = new File(args[0]);
File outputFile = new File(args[1]);
int updatedVersion = Integer.parseInt(args[2], 10);
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = domFactory.newDocumentBuilder();
Document doc = docBuilder.parse(inputFile);
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
XPathExpression expr = xpath.compile("/PremiereData/Project/@Version");
NodeList versionAttrNodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < versionAttrNodes.getLength(); i++) {
Attr versionAttr = (Attr) versionAttrNodes.item(i);
versionAttr.setNodeValue(String.valueOf(updatedVersion));
}
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(new DOMSource(doc), new StreamResult(outputFile));
}
}
Upvotes: 2