Reputation: 1808
I have 2 methods to parse the same XML document using DOM
and JDOM
. I expected JDOM
runs faster and consumes less memory than DOM
but actually JDOM
run several times slower and consumed much much more memory in my benchmark. I'm using JMH as the benchmarking framework.
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import java.io.File;
import java.util.concurrent.TimeUnit;
import java.io.IOException;
import java.util.List;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.jdom2.Document;
import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.input.SAXBuilder;
@BenchmarkMode(Mode.SingleShotTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 1, time = 200, timeUnit = TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
public class MyBenchmark {
@Param({"1.xml"})
public String xml;
@Benchmark
public void DOM(){
try {
File fXmlFile = new File(xml);
DocumentBuilderFactory dbFactory =
DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
} catch (Exception e) {
e.printStackTrace();
}
@Benchmark
public void JDOM(){
SAXBuilder builder = new SAXBuilder();
File xmlFile = new File(xml);
try {
Document document = (Document) builder.build(xmlFile);
} catch (IOException io) {
System.out.println(io.getMessage());
} catch (JDOMException jdomex) {
System.out.println(jdomex.getMessage());
}
}
}
DOM results
Benchmark (xml) Mode Cnt Score Error Units
MyBenchmark.DOM 1.xml ss 10 126.823 ± 16.821 ms/op
MyBenchmark.DOM:·gc.alloc.rate 1.xml ss 10 92.618 ± 2.481 MB/sec
MyBenchmark.DOM:·gc.alloc.rate.norm 1.xml ss 10 60869076.800 ± 130.041 B/op
JDOM2 results
Benchmark (xml) Mode Cnt Score Error Units
MyBenchmark.JDOM 1.xml ss 10 789.941 ± 81.293 ms/op
MyBenchmark.JDOM:·gc.alloc.rate 1.xml ss 10 2248.753 ± 141.240 MB/sec
MyBenchmark.JDOM:·gc.alloc.rate.norm 1.xml ss 10 3037712408.000 ± 0.001 B/op
The document has 12MB in size and contains 192,000 elements. Below is the structure of the XML document, the document has 38,400 level 1 elements:
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<level_1 Element_Number="1">
<level_2 Attribute_Level_2="Attribute_Level_2">
<level_3_1 Attribute_Level_3="Attribute_Level_3">test</level_3_1>
<level_3_2 Attribute_Level_3="Attribute_Level_3">test</level_3_2>
<level_3_3 Attribute_Level_3="Attribute_Level_3">test</level_3_3>
</level_2>
</level_1>
<level_1 Element_Number="2">
<level_2 Attribute_Level_2="Attribute_Level_2">
<level_3_1 Attribute_Level_3="Attribute_Level_3">test</level_3_1>
<level_3_2 Attribute_Level_3="Attribute_Level_3">test</level_3_2>
<level_3_3 Attribute_Level_3="Attribute_Level_3">test</level_3_3>
</level_2>
</level_1>
Can anyone explain this? I'm using JDOM 2.0.2 anyway.
Edit: DOM vs JDOM benchmarks for small documents (5000 to 25,000 elements)
Upvotes: 1
Views: 2170
Reputation: 163322
It doesn't match my experience. See Appendix A of http://www.saxonica.com/papers/xmlprague-2018mhk.pdf where I reported identical times for parsing/tree-building for DOM and JDOM2. That doesn't mean your figures are wrong, of course, it just means there is something about them that is specific to what you were measuring that might not extrapolate to a different scenario.
But why are you so coy about giving us actual numbers? What was the document size, what were the actual measurements? I was measuring a 10Mb XMark source document: what were you measuring?
==UPDATE==
I have now realised that I am not building the JDOM2 tree using the tree builder supplied with JDOM2, I am building it with Saxon's JDOM2 tree builder. So I changed it to use the JDOM2 builder - and it now goes a little faster: between 89.1 and 91.2ms, compared with 111.8ms for DOM.
But I'm also using Saxon's DOM builder rather than the native one. So let's change that too. The time for DOM now comes down to 74ms, which is fairly comparable with your figures.
I think (from what I have read) that the reason the DOM builder is faster is that it uses lazy building techniques: that is, it leaves some of the work to be done later, on first access to the data. This is why read-access to the DOM is not thread-safe; even though you are only using read methods at the API level, they are causing internal updates to the stored tree.
Upvotes: 3