Reputation: 1292
I would like to transform this xml:
<Root>
<Result>
<Message>
<Header>
<!-- Hundreds of child nodes -->
</Header>
<Body>
<Node1>
<!-- Hundreds of child nodes -->
<Node2>
<!-- Hundreds of child nodes -->
<Node3>
<!-- Hundreds of child nodes -->
<NodeX>value 1 to be changed</NodeX>
<!-- Hundreds of child nodes -->
<Node4>
<!-- Hundreds of child nodes -->
<NodeY>value 2 to be changed</NodeY>
</Node4>
</Node3>
</Node2>
</Node1>
</Body>
<RealValuesRoot>
<!-- These two nodes -->
<Value ID="1">this value must replace the value of Node X</Value>
<Value ID="2">this value must replace the value of Node Y</Value>
</RealValuesRoot>
</Message>
</Result>
<!-- Hundreds to thousands of similar MessageRoot nodes -->
</Root>
Into this xml:
<Root>
<Result>
<Message>
<Header>
<!-- Hundreds of child nodes -->
</Header>
<Body>
<Node1>
<!-- Hundreds of child nodes -->
<Node2>
<!-- Hundreds of child nodes -->
<Node3>
<!-- Hundreds of child nodes -->
<NodeX>this value must replace the value of Node X</NodeX>
<!-- Hundreds of child nodes -->
<Node4>
<!-- Hundreds of child nodes -->
<NodeY>this value must replace the value of Node Y</NodeY>
</Node4>
</Node3>
</Node2>
</Node1>
</Body>
</Message>
</Result>
<!-- Hundreds to thousands of similar MessageRoot nodes -->
</Root>
The output is almost identical to the input except for the following changes:
The "Value" nodes have unique IDs that represent unique xpaths in the body of the message, e.g. ID 1 refres to xpath /Message/Body/Node1/Node2/Node3/NodeX.
I have to use Microsoft’s xslt version 1.0!!
I already have an xslt that works fine and does everything I want, yet I am not satisfied with the performance!
My xslt works as follows:
I created a global string variable that acts like a key value pair, something like: 1:xpath1_2:xpath2_ … _N:xpathN. This variable relates the IDs of the "Value" nodes to the nodes in the message body that need to be replaced.
The xslt iterates the input xml recursively starting from the root node.
I compute the xpath for the current node then do one of the following:
As said, my xslt works fine but I would like to improve the performance as much as possible, please feel free to suggest a complete new xslt logic! your ideas and suggestions are welcomed!
Upvotes: 3
Views: 586
Reputation: 338128
The most efficient way to go about this is to use an XSL key.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- a key that indexes real values by their IDs -->
<xsl:key name="kRealVal" match="RealValuesRoot/Value" use="@ID" />
<!-- the identity template to copy everything -->
<xsl:template match="node() | @*">
<xsl:copy>
<xsl:apply-templates select="node() | @*" />
</xsl:copy>
</xsl:template>
<!-- ...except elements named <NodeX> -->
<xsl:template match="*[starts-with(name(), 'Node')]">
<xsl:variable name="myID" select="substring-after(name(), 'Node')" />
<xsl:variable name="myRealVal" select="key('kRealVal', $myID)" />
<xsl:copy>
<xsl:copy-of select="@*" />
<xsl:choose>
<xsl:when test="$myRealVal">
<xsl:value-of select="$myRealVal" />
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="node()" />
</xsl:otherwise>
</xsl:choose>
</xsl:copy>
</xsl:template>
<!-- the <RealValuesRoot> element can be trashed -->
<xsl:template match="RealValuesRoot" />
</xsl:stylesheet>
Here is the live preview of this solution: http://www.xmlplayground.com/R78v0n
Here is a proof-of-concept solution that uses Microsoft script extensions to do the heavy lifting:
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:script="http://tempuri.org/script"
>
<msxsl:script language="JScript" implements-prefix="script">
var index = {};
function getNode(context, xpath) {
var theContext = context[0],
theXpath = xpath[0].text,
result;
try {
result = theContext.selectSingleNode(theXpath)
} catch (ex) {
// xpath is invalid. we could also just throw here
// but lets return the empty node set.
result = theContext.selectSingleNode("*[false()]");
}
return result;
}
function buildIndex(id, node) {
var theNode = node[0];
if (id) index[id] = theNode;
return "";
}
function getValue(id) {
return (id in index) ? index[id] : '';
}
</msxsl:script>
<!-- this is the boilerplate to evaluate all the XPaths -->
<xsl:variable name="temp">
<xsl:for-each select="/root/source/map">
<xsl:value-of select="script:buildIndex(generate-id(script:getNode(/, @xpath)), .)" />
</xsl:for-each>
</xsl:variable>
<!-- the identity template to get things rolling -->
<xsl:template match="node() | @*">
<xsl:copy>
<xsl:apply-templates select="node() | @*" />
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<!-- actually evaluate $temp once, so the variable is being calculated -->
<xsl:value-of select="$temp" />
<xsl:apply-templates select="node() | @*" />
</xsl:template>
<!-- all <value> nodes do either have a related "actual value" or they are copied as they are -->
<xsl:template match="value">
<xsl:copy>
<xsl:copy-of select="@*" />
<xsl:variable name="newValue" select="script:getValue(generate-id())" />
<xsl:choose>
<xsl:when test="$newValue">
<xsl:value-of select="$newValue" />
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="node() | @*" />
</xsl:otherwise>
</xsl:choose>
</xsl:copy>
</xsl:template>
<!-- the <source> element can be dropped -->
<xsl:template match="source" />
</xsl:stylesheet>
It transforms
<root>
<value id="foo">this is to be replaced</value>
<source>
<map xpath="/root/value[@id = 'foo']">this is the new value</map>
</source>
</root>
to
<root>
<value id="foo">this is the new value</value>
</root>
Maybe you can go that route in your set-up.
The line of thought is this:
.selectSingleNode()
.generate-id()
to get an ID from a node.Tested successfully with msxsl.exe.
Of course this assumes that the input has those <map xpath="...">
elements, but that part is not really necessary and easily adaptable to your actual situation. You could build the index
object from a long string of XPaths that you split()
in JavaScript, for example.
Upvotes: 1
Reputation: 122364
I compute the xpath for the current node then do one of the following...
This is likely to be your inefficiency - if you're recalculating the path right back to the root every time you're likely to be looking at an O(N2) algorithm. Without seeing your XSLT this is rather speculative, but you might be able to trim this a bit by using parameters to pass the current path down the recursion - if your main algorithm is based on a standard identity template
<xsl:template match="@*|node()">
<xsl:copy><xsl:apply-templates select="@*|node()" /></xsl:copy>
</xsl:template>
then change it to something like
<xsl:template match="@*|node()">
<xsl:param name="curPath" />
<xsl:copy>
<xsl:apply-templates select="@*|node()">
<xsl:with-param name="curPath" select="concat($curPath, '/', name())" />
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
or whatever your logic is for building the paths you need. Now in your specific templates for the nodes you want to massage you already have the path to their parent node and you don't have to walk all the way up to the root every time.
You might need to add a
<xsl:template match="/">
<xsl:apply-templates />
</xsl:template>
so you don't get a double slash at the front of $curPath
The xpaths are hardcoded in the xslt as a string in a global variable
If instead of a string you represented this mapping in an XML structure then you could make use of the key mechanism to speed up your lookups:
<xsl:variable name="rtfLookupTable">
<lookuptable>
<val xpath="/first/xpath/expression" id="1" />
<val xpath="/second/xpath/expression" id="2" />
<!-- ... -->
</lookuptable>
</xsl:variable>
<xsl:variable name="lookupTable" select="msxsl:node-set($rtfLookupTable)" />
<xsl:key name="valByXpath" match="val" use="@xpath" />
(add xmlns:msxsl="urn:schemas-microsoft-com:xslt"
to your xsl:stylesheet
). Or if you want to avoid using the node-set
extension function then an alternative definition could be
<xsl:variable name="lookupTable" select="document('')//xsl:variable[name='rtfLookupTable']" />
which works by treating the stylesheet itself as a plain XML document.
Keys across multiple documents get a bit fiddly in XSLT 1.0 but it can be done, essentially you have to switch the current context to point to the $lookupTable
before calling the key function, so you need to save the current context in variables to let you refer to it later:
<xsl:template match="text()">
<xsl:param name="curPath" />
<xsl:variable name="dot" select="." />
<xsl:variable name="slash" select="/" />
<xsl:for-each select="$lookupTable">
<xsl:variable name="valId" select="key('valByXpath', $curPath)/@id" />
<xsl:choose>
<xsl:when test="$valId">
<xsl:value-of select="$slash//Value[@id = $valId]" />
<!-- or however you extract the right Value -->
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$dot" />
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:template>
Or indeed, why not just let the XSLT engine do the hard work for you. Rather than representing your mapping as a string
/path/to/node1_1:/path/to/node2_2
represent it as templates directly
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- copy everything as-is apart from exceptions below -->
<xsl:template match="@*|node()">
<xsl:copy><xsl:apply-templates select="@*|node()" /></xsl:copy>
</xsl:template>
<!-- delete the RealValuesRoot -->
<xsl:template match="RealValuesRoot" />
<xsl:template match="/path/to/node1">
<xsl:copy><xsl:value-of select="//Value[id='1']" /></xsl:copy>
</xsl:template>
<xsl:template match="/path/to/node2">
<xsl:copy><xsl:value-of select="//Value[id='2']" /></xsl:copy>
</xsl:template>
</xsl:stylesheet>
I'm sure you can see how the specific templates could easily be auto-generated from your existing mapping using some sort of template mechanism (which could even be another XSLT).
Upvotes: 2