Siraf
Siraf

Reputation: 1292

How to effeciently modify a large xml document slightly with Microsoft xslt 1.0

I would like to transform this xml:

<Root>
  <Result>
    <Message>
      <Header>
        <!-- Hundreds of child nodes -->        
      </Header>
      <Body>
        <Node1>
          <!-- Hundreds of child nodes -->
          <Node2>
            <!-- Hundreds of child nodes -->
            <Node3>
              <!-- Hundreds of child nodes -->
              <NodeX>value 1 to be changed</NodeX>
              <!-- Hundreds of child nodes -->
              <Node4>
                <!-- Hundreds of child nodes -->
                <NodeY>value 2 to be changed</NodeY>
              </Node4>
            </Node3>
          </Node2>          
        </Node1>        
      </Body>
      <RealValuesRoot>
        <!-- These two nodes -->
        <Value ID="1">this value must replace the value of Node X</Value>
        <Value ID="2">this value must replace the value of Node Y</Value>
      </RealValuesRoot>
    </Message>
  </Result>
  <!-- Hundreds to thousands of similar MessageRoot nodes -->
</Root>

Into this xml:

<Root>
  <Result>
    <Message>
      <Header>
        <!-- Hundreds of child nodes -->
      </Header>
      <Body>
        <Node1>
          <!-- Hundreds of child nodes -->
          <Node2>
            <!-- Hundreds of child nodes -->
            <Node3>
              <!-- Hundreds of child nodes -->
              <NodeX>this value must replace the value of Node X</NodeX>
              <!-- Hundreds of child nodes -->
              <Node4>
                <!-- Hundreds of child nodes -->
                <NodeY>this value must replace the value of Node Y</NodeY>
              </Node4>
            </Node3>
          </Node2>
        </Node1>
      </Body>
    </Message>
  </Result>
  <!-- Hundreds to thousands of similar MessageRoot nodes -->
</Root>

The output is almost identical to the input except for the following changes:

  1. The X and Y node values must be replaced with the values of the /RealValuesRoot/Value nodes.
  2. The /RealValuesRoot node must be removed from the output.
  3. The rest of the xml must remain the same in the output.

The "Value" nodes have unique IDs that represent unique xpaths in the body of the message, e.g. ID 1 refres to xpath /Message/Body/Node1/Node2/Node3/NodeX.

I have to use Microsoft’s xslt version 1.0!!

I already have an xslt that works fine and does everything I want, yet I am not satisfied with the performance!

My xslt works as follows:

  1. I created a global string variable that acts like a key value pair, something like: 1:xpath1_2:xpath2_ … _N:xpathN. This variable relates the IDs of the "Value" nodes to the nodes in the message body that need to be replaced.

  2. The xslt iterates the input xml recursively starting from the root node.

  3. I compute the xpath for the current node then do one of the following:

    1. If the current xpath completely matches one of the xpaths in the global list then I replace its value with the value of the corresponding "Value" node and continue the iteration.
    2. If the current xpath refers to the "RealValuesRoot" node then I omit that node (don’t copy it to the output) and continue to iterate recursively.
    3. If the current xpath does not exist in the global ID-xpath string then I copy the complete node to the output and continue the iteration. (this happens e.g. in the /Message/Header node that will never contain any nodes that need to be replaced)
    4. If the current xpath partially matches one of the xpaths in the global list then I simply continue to iterate recursively until I reach one of the 3 cases above.

As said, my xslt works fine but I would like to improve the performance as much as possible, please feel free to suggest a complete new xslt logic! your ideas and suggestions are welcomed!

Upvotes: 3

Views: 586

Answers (2)

Tomalak
Tomalak

Reputation: 338128

The most efficient way to go about this is to use an XSL key.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <!-- a key that indexes real values by their IDs -->
  <xsl:key name="kRealVal" match="RealValuesRoot/Value" use="@ID" />

  <!-- the identity template to copy everything -->    
  <xsl:template match="node() | @*">
    <xsl:copy>
      <xsl:apply-templates select="node() | @*" />
    </xsl:copy>
  </xsl:template>

  <!-- ...except elements named <NodeX> -->
  <xsl:template match="*[starts-with(name(), 'Node')]">
    <xsl:variable name="myID" select="substring-after(name(), 'Node')" />
    <xsl:variable name="myRealVal" select="key('kRealVal', $myID)" />

    <xsl:copy>
      <xsl:copy-of select="@*" />
      <xsl:choose>
        <xsl:when test="$myRealVal">
          <xsl:value-of select="$myRealVal" />
        </xsl:when>
        <xsl:otherwise>
          <xsl:apply-templates select="node()" />
        </xsl:otherwise>
      </xsl:choose>
    </xsl:copy>
  </xsl:template>

  <!-- the <RealValuesRoot> element can be trashed -->
  <xsl:template match="RealValuesRoot" />
</xsl:stylesheet>

Here is the live preview of this solution: http://www.xmlplayground.com/R78v0n


Here is a proof-of-concept solution that uses Microsoft script extensions to do the heavy lifting:

<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:msxsl="urn:schemas-microsoft-com:xslt"
  xmlns:script="http://tempuri.org/script"
>  
  <msxsl:script language="JScript" implements-prefix="script">
    var index = {};

    function getNode(context, xpath) {
      var theContext = context[0],
          theXpath = xpath[0].text,
          result;

      try {
        result = theContext.selectSingleNode(theXpath)
      } catch (ex) {
        // xpath is invalid. we could also just throw here
        // but lets return the empty node set.
        result = theContext.selectSingleNode("*[false()]");
      }
      return result;
    }
    function buildIndex(id, node) {
      var theNode = node[0];

      if (id) index[id] = theNode;
      return "";
    }
    function getValue(id) {
      return (id in index) ? index[id] : '';
    }
  </msxsl:script>


  <!-- this is the boilerplate to evaluate all the XPaths -->
  <xsl:variable name="temp">
    <xsl:for-each select="/root/source/map">
      <xsl:value-of select="script:buildIndex(generate-id(script:getNode(/, @xpath)), .)" />
    </xsl:for-each>
  </xsl:variable>

  <!-- the identity template to get things rolling -->
  <xsl:template match="node() | @*">
    <xsl:copy>
      <xsl:apply-templates select="node() | @*" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/">
    <!-- actually evaluate $temp once, so the variable is being calculated -->
    <xsl:value-of select="$temp" />
    <xsl:apply-templates select="node() | @*" />
  </xsl:template> 

  <!-- all <value> nodes do either have a related "actual value" or they are copied as they are -->
  <xsl:template match="value">
    <xsl:copy>
      <xsl:copy-of select="@*" />

      <xsl:variable name="newValue" select="script:getValue(generate-id())" />
      <xsl:choose>
        <xsl:when test="$newValue">
          <xsl:value-of select="$newValue" />
        </xsl:when>
        <xsl:otherwise>
          <xsl:apply-templates select="node() | @*" />
        </xsl:otherwise>
      </xsl:choose>
    </xsl:copy>
  </xsl:template>

  <!-- the <source> element can be dropped -->
  <xsl:template match="source" />

</xsl:stylesheet>

It transforms

<root>
  <value id="foo">this is to be replaced</value>

  <source>
    <map xpath="/root/value[@id = 'foo']">this is the new value</map>
  </source>
</root>

to

<root>
  <value id="foo">this is the new value</value>
</root>

Maybe you can go that route in your set-up.

The line of thought is this:

  • Iterate all XPaths you have, evaluating them with .selectSingleNode().
  • Store each evaluation result (one node, ideally) along with its unique ID in an object as key-value pairs. This uses XSLT's generate-id() to get an ID from a node.
  • Now transform the input normally. For each node in question, get its ID and check if a "new value" actually exists for that node.
  • If it does, insert that new value, if it doesn't, go on transforming.

Tested successfully with msxsl.exe.

Of course this assumes that the input has those <map xpath="..."> elements, but that part is not really necessary and easily adaptable to your actual situation. You could build the index object from a long string of XPaths that you split() in JavaScript, for example.

Upvotes: 1

Ian Roberts
Ian Roberts

Reputation: 122364

I compute the xpath for the current node then do one of the following...

This is likely to be your inefficiency - if you're recalculating the path right back to the root every time you're likely to be looking at an O(N2) algorithm. Without seeing your XSLT this is rather speculative, but you might be able to trim this a bit by using parameters to pass the current path down the recursion - if your main algorithm is based on a standard identity template

<xsl:template match="@*|node()">
  <xsl:copy><xsl:apply-templates select="@*|node()" /></xsl:copy>
</xsl:template>

then change it to something like

<xsl:template match="@*|node()">
  <xsl:param name="curPath" />
  <xsl:copy>
    <xsl:apply-templates select="@*|node()">
      <xsl:with-param name="curPath" select="concat($curPath, '/', name())" />
    </xsl:apply-templates>
  </xsl:copy>
</xsl:template>

or whatever your logic is for building the paths you need. Now in your specific templates for the nodes you want to massage you already have the path to their parent node and you don't have to walk all the way up to the root every time.

You might need to add a

<xsl:template match="/">
  <xsl:apply-templates />
</xsl:template>

so you don't get a double slash at the front of $curPath


The xpaths are hardcoded in the xslt as a string in a global variable

If instead of a string you represented this mapping in an XML structure then you could make use of the key mechanism to speed up your lookups:

<xsl:variable name="rtfLookupTable">
  <lookuptable>
    <val xpath="/first/xpath/expression" id="1" />
    <val xpath="/second/xpath/expression" id="2" />
    <!-- ... -->
  </lookuptable>
</xsl:variable>

<xsl:variable name="lookupTable" select="msxsl:node-set($rtfLookupTable)" />

<xsl:key name="valByXpath" match="val" use="@xpath" />

(add xmlns:msxsl="urn:schemas-microsoft-com:xslt" to your xsl:stylesheet). Or if you want to avoid using the node-set extension function then an alternative definition could be

<xsl:variable name="lookupTable" select="document('')//xsl:variable[name='rtfLookupTable']" />

which works by treating the stylesheet itself as a plain XML document.

Keys across multiple documents get a bit fiddly in XSLT 1.0 but it can be done, essentially you have to switch the current context to point to the $lookupTable before calling the key function, so you need to save the current context in variables to let you refer to it later:

<xsl:template match="text()">
  <xsl:param name="curPath" />

  <xsl:variable name="dot" select="." />
  <xsl:variable name="slash" select="/" />

  <xsl:for-each select="$lookupTable">
    <xsl:variable name="valId" select="key('valByXpath', $curPath)/@id" />
    <xsl:choose>
      <xsl:when test="$valId">
        <xsl:value-of select="$slash//Value[@id = $valId]" />
        <!-- or however you extract the right Value -->
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$dot" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:for-each>
</xsl:template>

Or indeed, why not just let the XSLT engine do the hard work for you. Rather than representing your mapping as a string

/path/to/node1_1:/path/to/node2_2

represent it as templates directly

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <!-- copy everything as-is apart from exceptions below -->
  <xsl:template match="@*|node()">
    <xsl:copy><xsl:apply-templates select="@*|node()" /></xsl:copy>
  </xsl:template>

  <!-- delete the RealValuesRoot -->
  <xsl:template match="RealValuesRoot" />

  <xsl:template match="/path/to/node1">
    <xsl:copy><xsl:value-of select="//Value[id='1']" /></xsl:copy>
  </xsl:template>
  <xsl:template match="/path/to/node2">
    <xsl:copy><xsl:value-of select="//Value[id='2']" /></xsl:copy>
  </xsl:template>
</xsl:stylesheet>

I'm sure you can see how the specific templates could easily be auto-generated from your existing mapping using some sort of template mechanism (which could even be another XSLT).

Upvotes: 2

Related Questions