Piotr Gwiazda
Piotr Gwiazda

Reputation: 12222

What JAXB uses under the hood?

I'm doing a very basic test of performance to unmarshal huge zipped XML file to a basic data structure.

I use the same input stream and same unmarshaller created from JAXB Context.

1

Simple approach - default JAXB - takes 18 seconds

unmarshaller.unmarshal(createFileInputStream());

2

Wrap input stream in SAX source to (I think) force using SAX - takes 20 seconds

unmarshaller.unmarshal(new InputSource(createFileInputStream());

3

Trying to force using STAX - takes 40 seconds !

XMLStreamReader reader = XMLInputFactory.newInstance().createXMLStreamReader(createFileInputStream());
unmarshaller.unmarshal(reader);

To compare simple STAX stream reader loop to extract same data needs only 14 seconds.

I was looking at Can JAXB parse large XML files in chunks

Question

Why default approach (passing the input stream) is faster than both SAX and STAX? What is used by default?

Why STAX approach is sooo slooooow while it's a recomended approach in mentioned article?

Upvotes: 0

Views: 136

Answers (1)

winne2
winne2

Reputation: 2250

Some thoughts:

  • Just debug/look at the source code. You will probably see that unmarshaller.unmarshal(inputStream) under the hood does exactly what you do in your 2nd approach: Wrap the input stream in a SAX InputSource. So your approach 1 and 2 should be equally fast.
  • Running a single pass for any Java benchmark does often not give useful results, because the JIT has no chance to "warm-up". To estimate your program's runtime behavior, you should add quite a few "dry runs" and then start measuring. Always measure multiple passes and look for max/min/mean/median instead of trusting a single number (there could e.g. be an expensive different background task during one of the runs).
  • Your 3rd approach should indeed force using StAX instead of SAX. As @Kayaman says, you should have a look at the canonical classname of reader (.getClass().getName()) to make sure it is what you expect.
    • StAX readers will not always be faster than SAX readers, it depends on the payload and the implementation you use
    • Woodstox or Aalto will typically significally outperform the JDK's default Xerces parser
    • note that you can (and should) reuse the XMLInputFactory instead of always creating a new instance
    • note that you should close the XMLStreamReader after use (may have performance benefit according to e.g. Woodstox documentation)
  • A Java 17 JVM will in many use-cases give a nice additional performance gain over Java 11 or 8 (expect 5..15%)

Upvotes: 0

Related Questions