anand
anand

Reputation: 103

Which is faster way to read an xml?

I am using XSLT for reading a 300MB XML file - I need to check some tags contents and based on that I have to print other tag elements as output.

It is taking a very long time (10 minutes) and in the end it terminates stating 'Killed' and no other output.

What else is the faster way? Can I read it using SAX parser in java? Thanks in advance.

Upvotes: 0

Views: 222

Answers (2)

Jon Hanna
Jon Hanna

Reputation: 113242

An XSLT stylesheet can be turned into a statemachine in much the same way a regular expression can (some libraries for dealing with XSLT have a "compile" option to allow one to decide on the cost of doing this versus the benefits of having done so).

When this is done it can be extremely fast. The nature of the stylesheet will affect this though. If the template-matching is such that it can all be done in a forward-only manner (or can be internally re-written into one that can) it will be much faster than if something requires it to step many steps back in the document.

Even the best implementation though will probably be slower than the best implementation with a forward-only parsing of the XML (whether push like SAX or pull). However, much as with the XSLT approach, if the forward-only parser has to hold onto a lot of state about previously encountered elements so that it can respond to some elements in a way that refers to those previous elements, this can slow things down considerably. Eventually you get to the point where a DOM approach (whether for the full document, or for subtrees of it) becomes comparable in speed and likely simpler (because referring to previously encountered elements is precisely what DOM makes easiest).

Hence whichever approach is taken, if you can see ways to rewrite parts that refer "up" or "back" in the document so that they do this less, you'll gain a greater benefit.

Upvotes: 1

Michael Kay
Michael Kay

Reputation: 163322

First step is to determine whether the time is spent in the XML parser or in the XSLT processor: try (a) a transformation that does nothing (<xsl:template match="/"/>), and (b) a transformation that copies everything (<xsl:template match="/"><xsl:copy-of select="."/></xsl:template>) and compare the results with your actual transformation.

XSLT processors do vary a lot and if at all possible, you should try several. You may also need to experiment with different ways of using your XSLT processor, for example some have an internal tree model which is likely to be much more efficient than using a DOM. So come back here with details of your processor and how you are using it.

Finally, "a long time" tells us nothing. Tell us how long it actually takes, and we can tell you whether that's reasonable, or whether something is badly wrong somewhere.

Upvotes: 3

Related Questions