Reputation: 41627
At work I am parsing large XML files using the DefaultHandler
class. Doing that, I noticed that this interface allocates many String
s, for element names, attribute names and values, and so on.
From that, I thought about creating an XML parser that only does the absolute minimum of object allocation. Currently I need:
My test program, for parsing http://magnatune.com/info/song_info.xml, looks like this:
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
public class XmlParserDemo {
public static void main(String[] args) throws IOException {
List<Map<String, String>> allSongs = new ArrayList<Map<String, String>>();
InputStream fis = new FileInputStream("d:/song_info.xml");
try {
XmlParser parser = new XmlParser(new BufferedInputStream(fis));
if (parser.element("AllSongs")) {
while (parser.element("Track")) {
Map<String, String> track = new LinkedHashMap<String, String>();
while (parser.element()) {
String name = parser.getElementName();
String value = parser.text();
track.put(name, value);
parser.endElement();
}
allSongs.add(track);
parser.endElement();
}
parser.endElement();
}
} finally {
fis.close();
}
}
}
This code looks better than my experiments with the XMLEventReader
. Now the only missing part would be the XmlParser
class mentioned in the code above. Do you know if someone has written that code before? It's really just a pet project of mine, but I'm curious how much the old statement Object creation is expensive is worth anymore.
Yes, I know that LinkedHashMap
s are using much memory. It's really just the parsing part that I want to be memory-efficient. Everything else is just for making a simple example.
Upvotes: 1
Views: 1686
Reputation: 1500425
"Object creation is expensive hasn't been true" for quite a long time in Java. Allocation is usually dirt cheap (move a pointer) and garbage collection has come a long way.
I would definitely use an XML API which lets you do what you want easily rather than worrying too much about excessive memory allocation, unless you think you're going to be pushing your performance boundaries.
I'm sure there are XML APIs designed to have a particularly small memory footprint - but just how large are your XML files? If they're small enough to fit into memory easily, I'd just not worry about it... and if they're too large for that you really need to be thinking of a streaming API anyway. I suspect the band of sizes where a particularly efficient parser could fit it in memory but a "normal" one couldn't is relatively small, in terms of applicability.
Upvotes: 1