Roland Illig
Roland Illig

Reputation: 41627

Java XML parser without excessive memory allocation

At work I am parsing large XML files using the DefaultHandler class. Doing that, I noticed that this interface allocates many Strings, for element names, attribute names and values, and so on.

From that, I thought about creating an XML parser that only does the absolute minimum of object allocation. Currently I need:

My test program, for parsing http://magnatune.com/info/song_info.xml, looks like this:

import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

public class XmlParserDemo {
  public static void main(String[] args) throws IOException {
    List<Map<String, String>> allSongs = new ArrayList<Map<String, String>>();

    InputStream fis = new FileInputStream("d:/song_info.xml");
    try {
      XmlParser parser = new XmlParser(new BufferedInputStream(fis));
      if (parser.element("AllSongs")) {
        while (parser.element("Track")) {
          Map<String, String> track = new LinkedHashMap<String, String>();
          while (parser.element()) {
            String name = parser.getElementName();
            String value = parser.text();
            track.put(name, value);
            parser.endElement();
          }
          allSongs.add(track);
          parser.endElement();
        }
        parser.endElement();
      }
    } finally {
      fis.close();
    }
  }
}

This code looks better than my experiments with the XMLEventReader. Now the only missing part would be the XmlParser class mentioned in the code above. Do you know if someone has written that code before? It's really just a pet project of mine, but I'm curious how much the old statement Object creation is expensive is worth anymore.

Yes, I know that LinkedHashMaps are using much memory. It's really just the parsing part that I want to be memory-efficient. Everything else is just for making a simple example.

Upvotes: 1

Views: 1686

Answers (1)

Jon Skeet
Jon Skeet

Reputation: 1500425

"Object creation is expensive hasn't been true" for quite a long time in Java. Allocation is usually dirt cheap (move a pointer) and garbage collection has come a long way.

I would definitely use an XML API which lets you do what you want easily rather than worrying too much about excessive memory allocation, unless you think you're going to be pushing your performance boundaries.

I'm sure there are XML APIs designed to have a particularly small memory footprint - but just how large are your XML files? If they're small enough to fit into memory easily, I'd just not worry about it... and if they're too large for that you really need to be thinking of a streaming API anyway. I suspect the band of sizes where a particularly efficient parser could fit it in memory but a "normal" one couldn't is relatively small, in terms of applicability.

Upvotes: 1

Related Questions