Reputation: 4864
I am working with big and complex dictionary data (XML) which needs to be parsed by XSL and output XML.
what would be considered as a "best" way to test if XSL is processing all nodes from XML (input)?
please consider this simple example, i think it will represent nature of the problem:
input.xml
<?xml version="1.0" encoding="UTF-8"?>
<a>
<b>
<c>
some1
<d>text2</d>
more text1
</c>
</b>
<b>
<c>
some2
<d>text2</d>
more text2
</c>
</b>
<d>text3</d>
<e>
text
<d>4</d>
</e>
</a>
some tarnsformations.xsl
output.xml
<?xml version="1.0" encoding="UTF-8"?>
<amodified>
<bmodified>
some1
<dd>text2</dd>
more text1
</bmodified>
<bmodified>
some2
<dd>text2</dd>
more text2
</bmodified>
<dd>text3</dd>
<ed>text</ed>
<dd>4</dd>
</amodified>
In output.xml names of the tags have been changed as well as order of the content (comparing to input file). I need to compare if all text fields from Input are available in output. I think the best solution would be to creat test which will extract text from each tag and compare it string by string, outputing tags taht do not exist in output.xml to log file... ?
Upvotes: 0
Views: 2367
Reputation: 7044
I would recommend two kinds of tests: first a unit test on a smaller controlled set of data that is supposed to be a model for the data you find in your large dictionary. This could be considered a unit test for your xslt process. I usually would extract several representative pieces from the larger data set, and store these along with the test code. Then the test applies the transformation to the test data and makes assertions about the result, verifying that the transformation was successfully employed.
Then additionally you should build sanity checks in to your production system so that (for example), you make sure that the total number of nodes processed corresponds to what you expect. For example, in a dictionary with a large number of entries, you could run one step to count all the entries, and then another one to process them. Then at the end, see how many entries you processed and make sure the count is the same as what you expected. This is also useful since it provides a means of outputting a progress bar (% complete).
Anyway, that's what we do.
If the text in the output is the same as the text in the input, as in your example, Marcin, you can compare those fairly easily using xslt. If you process an xml file with an empty xslt stylesheet (just the <xslt:stylesheet />
node) then you will get back just the text, with no markup. I think xmllint can do this too. So just run that over both your input and output and compare using a simple text comparison (like diff).
Upvotes: 2
Reputation: 243459
One can use this technique:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ext="http://exslt.org/common">
<xsl:output method="text"/>
<xsl:template match="/*">
<xsl:variable name="vrtfResults">
<xsl:apply-templates select="num"/>
</xsl:variable>
<xsl:variable name="vProcessed" select=
"count(ext:node-set($vrtfResults)/nodeProcessed)"/>
<xsl:variable name="vAll" select="count(num)"/>
<xsl:text>From the existing </xsl:text>
<xsl:value-of select="$vAll"/>
<xsl:text> <num> elements </xsl:text>
<xsl:value-of select="$vProcessed"/>
<xsl:text> were processed.</xsl:text>
</xsl:template>
<xsl:template match="num">
<nodeProcessed/>
<num><xsl:value-of select="2*."/></num>
</xsl:template>
</xsl:stylesheet>
when applied on the following XML document:
<nums>
<num>01</num>
<num>02</num>
<num>03</num>
<num>04</num>
<num>05</num>
<num>06</num>
<num>07</num>
<num>08</num>
<num>09</num>
<num>10</num>
</nums>
the wanted result is produced:
From the existing 10 <num> elements 10 were processed.
Explanation:
A special test-only element (<nodeProcessed/>
) has been added to the processing of every <num>
element.
We capture the output in a variable, then we count the number of <nodeProcessed/>
elements and compare them to the total number of <num>
elements that must be processed.
Upvotes: 1