rocketboy2000
rocketboy2000

Reputation: 119

sum xml values based on more than one grouping

sorry I'm rather new to xml processing... I have the following:

<divisions>
  <division>
    <divisionName>D1</divisionName>
    <subdivisions>
      <subdivision>
        <subdivisionName>SD1</subdivisionName>
        <values>
          <value>1</value>
        </values>
      </subdivision>
      <subdivision>
        <subdivisionName>SD2</subdivisionName>
        <values>
          <value>1</value>
          <value>2</value>
        </values>
      </subdivision>
    </subdivisions>
  </division>
  <division>
    <divisionName>D2</divisionName>
    <subdivisions>
      <subdivision>
        <subdivisionName>SD3</subdivisionName>
        <values>
          <value>2</value>
          <value>2</value>
        </values>  
      </subdivision>
    </subdivisions>
  </division>
</divisions>

that I'd like to transform using either XPath or XQuery to a flat file where values are summed per division and subdivision. so for the above the outcome would be:

D1 SD1 1
D1 SD2 3
D2 SD3 4

my actual file has about 7 million lines so am interested in whether it would be necessary to use some stream form of parsing and whether XPath or XQuery would perform best.

I've attempted a number of XQuery but am finding it difficult to group by higher order loops:

for all divisions
  for all subdivisions
    print divisionName, subdivisionName, sum(values)

any insight appreciated!

Upvotes: 0

Views: 173

Answers (4)

Martin Honnen
Martin Honnen

Reputation: 167516

It is not clear whether you need to group (because there are duplicate division elements and/or duplicate subdivision elements inside a division), if you don't have duplicates you can use (XQuery 3.1)

declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";

declare option output:method 'text';
declare option output:item-separator  '&#10;';

for $d in divisions/division,
    $sd in $d/subdivisions/subdivision
return $d/divisionName/data() || ' ' || $sd/subdivisionName/data() || ' ' || sum($sd/values/value)

https://xqueryfiddle.liberty-development.net/bFukv8j

If you need to group then XQuery as a group-by clause e.g.

declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";

declare option output:method 'text';
declare option output:item-separator  '&#10;';

for $d in divisions/division
group by $dn := $d/divisionName
for $sd in $d/subdivisions/subdivision
group by $dn, $sdn := $sd/subdivisionName
return $dn || ' ' || $sdn || ' ' || sum($sd/values/value)

https://xqueryfiddle.liberty-development.net/bFukv8j/2

As for performance, you might need to check with the XQuery processor of your choice, in an XQuery data base it depends anyway on database organization and indexes I think.

XSLT 3 has streamed processing but as your elements have grouping keys in child elements you need to copy items:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="#all"
    version="3.0">

  <xsl:mode streamable="yes"/>

  <xsl:output method="text" />

  <xsl:template match="divisions">
     <xsl:for-each-group select="division!copy-of()!subdivisions/subdivision" composite="yes" group-by="ancestor::division/divisionName, subdivisionName">
        <xsl:value-of select="current-grouping-key(), sum(current-group()/values/value)" separator=" "/>
        <xsl:text>&#10;</xsl:text>
     </xsl:for-each-group>
  </xsl:template>

</xsl:stylesheet>

https://xsltfiddle.liberty-development.net/gWvjQeJ

Upvotes: 0

imran
imran

Reputation: 461

<xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="divisions">
        <xsl:for-each select="division">
            <xsl:for-each select="subdivisions/subdivision">
                <xsl:value-of select="ancestor::subdivisions/preceding-sibling::divisionName"/><xsl:text> </xsl:text>
                <xsl:value-of select="subdivisionName"/><xsl:text> </xsl:text>
                <xsl:value-of select="sum(values/value)"/>
                <xsl:text>&#x0a;</xsl:text>
            </xsl:for-each>
        </xsl:for-each>
    </xsl:template>
You may try in xslt

Upvotes: 0

rocketboy2000
rocketboy2000

Reputation: 119

think I figured this out (also removes whitespace):

for $divisionName in distinct-values(//divisionName)
    for $subdivisionName in distinct-values(//subdivisionName)
        return concat($divisionName,$subdivisionName,sum(//division[divisionName = $divisionName]//subdivision[subdivisionName = $subdivisionName]//value),'&#xa;')        

Upvotes: 1

zx485
zx485

Reputation: 29022

You can use this simple XQuery. The declare statements are just for setting the right output mode.

xquery version "1.0";
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization"; 
declare option output:method "text"; 
let $db := doc("test.xml")/divisions 
for $x in $db/division, $y in $x//subdivision
return concat(distinct-values($x/divisionName), ' ', distinct-values($y/subdivisionName), ' ', sum($y/values/value),'&#xa;')

Its output (tested with Saxon-9) is

D1 SD1 1
 D1 SD2 3
 D2 SD3 4

I haven't compared it to an XSLT implementation, but this query is relatively simple, so I guess that it's fast.

Upvotes: 1

Related Questions