Reputation: 119
sorry I'm rather new to xml processing... I have the following:
<divisions>
<division>
<divisionName>D1</divisionName>
<subdivisions>
<subdivision>
<subdivisionName>SD1</subdivisionName>
<values>
<value>1</value>
</values>
</subdivision>
<subdivision>
<subdivisionName>SD2</subdivisionName>
<values>
<value>1</value>
<value>2</value>
</values>
</subdivision>
</subdivisions>
</division>
<division>
<divisionName>D2</divisionName>
<subdivisions>
<subdivision>
<subdivisionName>SD3</subdivisionName>
<values>
<value>2</value>
<value>2</value>
</values>
</subdivision>
</subdivisions>
</division>
</divisions>
that I'd like to transform using either XPath or XQuery to a flat file where values are summed per division and subdivision. so for the above the outcome would be:
D1 SD1 1
D1 SD2 3
D2 SD3 4
my actual file has about 7 million lines so am interested in whether it would be necessary to use some stream form of parsing and whether XPath or XQuery would perform best.
I've attempted a number of XQuery but am finding it difficult to group by higher order loops:
for all divisions
for all subdivisions
print divisionName, subdivisionName, sum(values)
any insight appreciated!
Upvotes: 0
Views: 173
Reputation: 167516
It is not clear whether you need to group (because there are duplicate division
elements and/or duplicate subdivision
elements inside a division
), if you don't have duplicates you can use (XQuery 3.1)
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method 'text';
declare option output:item-separator ' ';
for $d in divisions/division,
$sd in $d/subdivisions/subdivision
return $d/divisionName/data() || ' ' || $sd/subdivisionName/data() || ' ' || sum($sd/values/value)
https://xqueryfiddle.liberty-development.net/bFukv8j
If you need to group then XQuery as a group-by
clause e.g.
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method 'text';
declare option output:item-separator ' ';
for $d in divisions/division
group by $dn := $d/divisionName
for $sd in $d/subdivisions/subdivision
group by $dn, $sdn := $sd/subdivisionName
return $dn || ' ' || $sdn || ' ' || sum($sd/values/value)
https://xqueryfiddle.liberty-development.net/bFukv8j/2
As for performance, you might need to check with the XQuery processor of your choice, in an XQuery data base it depends anyway on database organization and indexes I think.
XSLT 3 has streamed processing but as your elements have grouping keys in child elements you need to copy items:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="3.0">
<xsl:mode streamable="yes"/>
<xsl:output method="text" />
<xsl:template match="divisions">
<xsl:for-each-group select="division!copy-of()!subdivisions/subdivision" composite="yes" group-by="ancestor::division/divisionName, subdivisionName">
<xsl:value-of select="current-grouping-key(), sum(current-group()/values/value)" separator=" "/>
<xsl:text> </xsl:text>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
https://xsltfiddle.liberty-development.net/gWvjQeJ
Upvotes: 0
Reputation: 461
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="divisions">
<xsl:for-each select="division">
<xsl:for-each select="subdivisions/subdivision">
<xsl:value-of select="ancestor::subdivisions/preceding-sibling::divisionName"/><xsl:text> </xsl:text>
<xsl:value-of select="subdivisionName"/><xsl:text> </xsl:text>
<xsl:value-of select="sum(values/value)"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:for-each>
</xsl:template>
You may try in xslt
Upvotes: 0
Reputation: 119
think I figured this out (also removes whitespace):
for $divisionName in distinct-values(//divisionName)
for $subdivisionName in distinct-values(//subdivisionName)
return concat($divisionName,$subdivisionName,sum(//division[divisionName = $divisionName]//subdivision[subdivisionName = $subdivisionName]//value),'
')
Upvotes: 1
Reputation: 29022
You can use this simple XQuery. The declare
statements are just for setting the right output mode.
xquery version "1.0";
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "text";
let $db := doc("test.xml")/divisions
for $x in $db/division, $y in $x//subdivision
return concat(distinct-values($x/divisionName), ' ', distinct-values($y/subdivisionName), ' ', sum($y/values/value),'
')
Its output (tested with Saxon-9) is
D1 SD1 1
D1 SD2 3
D2 SD3 4
I haven't compared it to an XSLT implementation, but this query is relatively simple, so I guess that it's fast.
Upvotes: 1