KnomDeGuerre
KnomDeGuerre

Reputation: 357

XSLT Transform Efficiency

I am a support engineer and our company's product allows XSLT transforms to customize outputs.

I made a xsl transform for this purpose. It works well for source files of typical size (several 100k), but occasionally a really huge (10M) source file will come by. In such case, the output is not generated even if I let it grind several days.

The SW engineering team tested it and discovered that for the transform and large source file in question is indeed very slow (>days), if our product is compiled to use the transform engine in .Net 1.1, but if they compile it with .Net 2.0, it is plenty fast (about 1-2 minutes).

The long term solution obviously is, wait for the next release.

For the short term I am wondering the following: 1) Is XSLT flexible enough that there are more efficient and less efficient ways to acheive the same result? For example, is it possible that the way I structured the xsl, the transform engine has to iterate from the beginning of the source file many many times, taking longer and longer as the next result piece gets farther and farther from the beginning? (Schlemiel the Painter), or 2) Is it more dependent on how the transform engine interprets the xsl?

If 2 is the case, I don't want to waste a lot of time trying to improve the xsl (I am not a big xsl genius, it was hard enough for me to achieve what little I did...).

Thanks!

Upvotes: 9

Views: 2918

Answers (5)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243529

To detect when to start a new section, I did this:

<xsl:if test="@TheFirstCol>preceding-sibling::*[1]/@TheFirstCol"

Could this be causing a lot or re-iteration?

Definitely. The algorithm you've chosen is O(N2) and would be very slow with sufficient number of siblings, regardless of the implementation language.

Here is an efficient algorithm using keys:

Solution1:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:output method="text"/>

 <xsl:key name="kC1Value" match="@c1" use="."/>

    <xsl:template match="/">
      <xsl:for-each select="*/x[generate-id(@c1) = generate-id(key('kC1Value',@c1)[1])]">

       <xsl:value-of select="concat('&#xA;',@c1)"/>

       <xsl:for-each select="key('kC1Value',@c1)">
         <xsl:value-of select="'&#xA;'"/>
         <xsl:for-each select="../@*[not(name()='c1')]">
           <xsl:value-of select="concat('   ', .)"/>
         </xsl:for-each>
       </xsl:for-each>
      </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

Unfortunately, XslTransform (.Net 1.1) has a notoriously inefficient implementation of the generate-id() function.

The following may be faster with XslTransform:

Solution2:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:output method="text"/>

 <xsl:key name="kC1Value" match="@c1" use="."/>

    <xsl:template match="/">
      <xsl:for-each select="*/x[count(@c1 | key('kC1Value',@c1)[1]) = 1]">

       <xsl:value-of select="concat('&#xA;',@c1)"/>

       <xsl:for-each select="key('kC1Value',@c1)">
         <xsl:value-of select="'&#xA;'"/>
         <xsl:for-each select="../@*[not(name()='c1')]">
           <xsl:value-of select="concat('   ', .)"/>
         </xsl:for-each>
       </xsl:for-each>
      </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

When applied on the following small XML document:

<t>
 <x c1="1" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="1" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="1" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="1" c2="1" c3="1" c4="0" c5="0"/>
 <x c1="2" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="2" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="2" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="2" c2="1" c3="1" c4="0" c5="0"/>
 <x c1="3" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="3" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="3" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="3" c2="1" c3="1" c4="0" c5="0"/>
 <x c1="3" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="3" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="3" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="3" c2="1" c3="1" c4="0" c5="0"/>
 <x c1="4" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="4" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="4" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="4" c2="1" c3="1" c4="0" c5="0"/>
 <x c1="5" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="5" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="5" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="5" c2="1" c3="1" c4="0" c5="0"/>
 <x c1="5" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="5" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="6" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="6" c2="1" c3="1" c4="0" c5="0"/>
 <x c1="6" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="6" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="6" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="6" c2="1" c3="1" c4="0" c5="0"/>
 <x c1="7" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="7" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="7" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="7" c2="1" c3="1" c4="0" c5="0"/>
 <x c1="8" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="8" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="8" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="8" c2="1" c3="1" c4="0" c5="0"/>
</t>

both solutions produced the wanted result:

1
   0   0   0   0
   0   1   0   0
   2   0   0   0
   1   1   0   0
2
   0   0   0   0
   0   1   0   0
   2   0   0   0
   1   1   0   0
3
   0   0   0   0
   0   1   0   0
   2   0   0   0
   1   1   0   0
   0   0   0   0
   0   1   0   0
   2   0   0   0
   1   1   0   0
4
   0   0   0   0
   0   1   0   0
   2   0   0   0
   1   1   0   0
5
   0   0   0   0
   0   1   0   0
   2   0   0   0
   1   1   0   0
   0   0   0   0
   0   1   0   0
6
   2   0   0   0
   1   1   0   0
   0   0   0   0
   0   1   0   0
   2   0   0   0
   1   1   0   0
7
   0   0   0   0
   0   1   0   0
   2   0   0   0
   1   1   0   0
8
   0   0   0   0
   0   1   0   0
   2   0   0   0
   1   1   0   0

From the above small XML file I generated a 10MB XML file by copying every element 6250 times (using another XSLT transformation :) ).

With the 10MB xml file and with XslCompiledTransform (.Net 2.0 + ) the two solutions had the following transformation times:

Solution1: 3.3sec.
Solution2: 2.8sec.

With XslTransform (.Net 1.1) Solution2 ran for 1622sec.; that is about 27 minutes.

Upvotes: 3

trebormf
trebormf

Reputation: 3224

Normally, if you see a non-linear increase in processing time vs. input size, you should suspect your code more than the framework. But since the problem goes away when the tool is compiled with .NET 2.0, all bets are off.

With XSLT, it's hard to create a non-linear performance curve if you do all your parsing with straight template matches:

<xsl:template match="foo">
  <!--OUTPUT-->
  <xsl:apply-templates / >
  <!--OUTPUT-->
</xsl:template>

 <xsl:template match="bar">
  <!--OUTPUT-->
  <xsl:apply-templates / >
  <!--OUTPUT-->
</xsl:template>

Pay careful attention to anywhere you might have resorted to <xsl:for-each> for parsing; template matches are virtually always a better way to achieve the same result.

One way to troubleshoot this performance problem is to recreate your XSLT one template-match at a time, testing the processing time after adding each match. You might start with this match:

<xsl:template match="*">
  <xsl:copy>                   <!--Copy node                   -->
    <xsl:copy-of select="@*"/> <!--Copy node attributes         -->
    <xsl:apply-templates />    <!--Process children             -->
  </xsl:copy>
</xsl:template>

This will match and copy every node, one at a time, to a new document. This should not exhibit a non-linear increase in processing time vs. input size (if it does, then the problem is not with your XSLT code).

As you recreate your XSLT, if you add a template-match that suddenly kills performance, comment out every block inside the template. Then, uncomment one block at a time, testing the processing time each iteration, until you find the block that causes the problem.

Upvotes: 4

Maxime Rouiller
Maxime Rouiller

Reputation: 13699

Upon looking up your problem, I found a KB at Microsoft about that. You can see it here.

They say that the XSLT transform in .NET 1 has some issues with the performance and that they can offer a quick fix.

If you want to try to troubleshoot the problem, there is a XSLT profiler available here.

Otherwise, you can see what links is given on the Microsoft website for optimizing speed issues with XSLT (link).

Upvotes: 1

Matt Large
Matt Large

Reputation: 389

One thing world checking is if your XSLT does lookups into other parts of the XML document a lot, i.e. you are in one context node and lookup a value in another part of the document or even another document. If you are doing this it can hit performance quite hard and you should consider using xsl:key and the key function for this. It tells the processor to implement fast lookup index on the data in question.

I was once building an XSLT that took 8 hours to run (with lots of cross references) and switching to use keys gave it a massive speed boost.

Upvotes: 2

Marco
Marco

Reputation: 15027

I'm not familiar with the .NET implementations, but there are a few things you can do in general to speed up processing of large documents:

  • Avoid using "//" in Xpath expressions unless absolutely necessary.
  • If you only need the first or only element that matches an Xpath expression, use the "[1]" qualifier, e.g. "//iframe[1]". Many processors implement optimizations for this.
  • Whenever possible, when dealing with huge XML input, see if you can design a solution around a stream-based parser (like SAX) instead of a DOM-based parser.

Upvotes: 5

Related Questions