nmu
nmu

Reputation: 1519

How to make minor edits to an XML file in Java

I am trying to change a single value in a large (5mb) XML file. I always know the value will be in the first 10 lines, therefore I do not need to read in 99% of the file. Yet it seems doing a partial XML read in Java is quite tricky.

In this picture you can see the single value I need to access.

I have read a lot about XML in Java and the best practices of handling it. However, in this case I am unsure of what the best approach would be - A DOM, STAX or SAX parser all seem to have different best use case scenarios - and I am not sure which would best suit this problem. Since all I need to do is edit one value.

Perhaps, I shouldn't even use an XML parser and just go with regex, but it seem like it is a pretty bad idea to use regex on XML

Hoping someone could point me in the right direction, Thanks!

Upvotes: 3

Views: 349

Answers (2)

ug_
ug_

Reputation: 11440

You can use the StAX parser to write the XML as you read it. While doing this you can replace the content as it parses. Using a StAX parser will only contain parts of the xml in memory at any given time.

public static void main(String [] args) throws Exception {
    final String newProjectId = "888";

    File inputFile = new File("in.xml");
    File outputFile = new File("out.xml");
    System.out.println("Reading " + inputFile);
    System.out.println("Writing " + outputFile);

    XMLInputFactory inFactory = XMLInputFactory.newInstance();
    XMLEventReader eventReader = inFactory.createXMLEventReader(new FileInputStream(inputFile));
    XMLOutputFactory factory = XMLOutputFactory.newInstance();
    XMLEventWriter writer = factory.createXMLEventWriter(new FileWriter(outputFile));
    XMLEventFactory eventFactory = XMLEventFactory.newInstance();


    boolean useExistingEvent; // specifies if we should use the event right from the reader
    while (eventReader.hasNext()) {
        XMLEvent event = eventReader.nextEvent();
        useExistingEvent = true;

        // look for our Project element
        if(event.getEventType() == XMLEvent.START_ELEMENT) {
            // read characters
            StartElement elemEvent = event.asStartElement();
            Attribute attr = elemEvent.getAttributeByName(QName.valueOf("ObjectID"));
            // check to see if this is the project we want 
            // TODO: put what logic you want here
            if("Project".equals(elemEvent.getName().getLocalPart()) && attr != null && attr.getValue().equals("1")) {
                Attribute versionAttr = elemEvent.getAttributeByName(QName.valueOf("Version"));

                // we need to make a list of new attributes for this element which doesnt include the Version a
                List<Attribute> newAttrs = new ArrayList<>(); // new list of attrs
                Iterator<Attribute> existingAttrs = elemEvent.getAttributes();
                while(existingAttrs.hasNext()) {
                    Attribute existing = existingAttrs.next();
                    // copy over everything but version attribute
                    if(!existing.getName().getLocalPart().equals("Version")) {
                        newAttrs.add(existing);
                    }
                }
                // add our new attribute for projectId
                newAttrs.add(eventFactory.createAttribute(versionAttr.getName(), newProjectId));

                // were using our own event instead of the existing one
                useExistingEvent = false;
                writer.add(eventFactory.createStartElement(elemEvent.getName(), newAttrs.iterator(), elemEvent.getNamespaces()));
            }
        }

        // persist the existing event.
        if(useExistingEvent) {
            writer.add(event);
        }

    }
    writer.close();
}

Upvotes: 2

Sean Bright
Sean Bright

Reputation: 120704

I would choose DOM over SAX or StAX simply for the (relative) simplicity of the API. Yes, there is some boilerplate code to get the DOM populated, but once you get past that it is fairly straight-forward.

Having said that, if your XML source is 100s or 1000s of megabytes, one of the streaming APIs would be better suited. As it is, 5MB is not what I would consider a large dataset, so go ahead and use DOM and call it a day:

import java.io.File;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;

public class ChangeVersion
{
    public static void main(String[] args)
            throws Exception
    {
        if (args.length < 3) {
            System.err.println("Usage: ChangeVersion <input> <output> <new version>");
            System.exit(1);
        }

        File inputFile = new File(args[0]);
        File outputFile = new File(args[1]);
        int updatedVersion = Integer.parseInt(args[2], 10);

        DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder docBuilder = domFactory.newDocumentBuilder();
        Document doc = docBuilder.parse(inputFile);

        XPathFactory xpathFactory = XPathFactory.newInstance();
        XPath xpath = xpathFactory.newXPath();
        XPathExpression expr = xpath.compile("/PremiereData/Project/@Version");

        NodeList versionAttrNodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);

        for (int i = 0; i < versionAttrNodes.getLength(); i++) {
            Attr versionAttr = (Attr) versionAttrNodes.item(i);
            versionAttr.setNodeValue(String.valueOf(updatedVersion));
        }

        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();

        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.transform(new DOMSource(doc), new StreamResult(outputFile));
    }
}

Upvotes: 2

Related Questions