Jacob Head
Jacob Head

Reputation: 13

XSLT; finding most frequent element value in a document

Apologies if this is a very simple question; I don't use XSLT very much and I can't find much advice on the web, as there is lots of pollution in search results!

I have an XML document in the following form. Its main purpose is to be reformatted in a few ways by XSLT for display in a couple of different formats.

<desk>
<drawer>
    <contents>pencils</contents>
    <quantity>2</quantity>
</drawer>
<drawer>
    <contents>pens</contents>
    <quantity>15</quantity>
</drawer>
<drawer>
    <contents>pencils</contents>
    <quantity>3</quantity>
</drawer>
<drawer>
    <contents>rulers</contents>
    <quantity>2</quantity>
</drawer>
</desk>

I'd like to extract from the xml two pieces of information: i) the average quantity; ii) the most frequently encountered content by number of appearances in the xml (i.e. "pencils" because it appears twice rather than "pens" because it has the largest quantity). The idea is that this can be piped into a very simple shell script. I therefore thought that the easiest way of getting this information would be to write couple of short xsl style-sheets and then use xsltproc to get the data.

The first piece of information seems straight-forward. The heart of the style-sheet would be this line:

<xsl:value-of select="(sum(drawer/quantity)) div (count(drawer))" />

but I'm a bit stuck by the second.

I think I can use something like this for loop through a list of each individual content:

<xsl:for-each select="drawer[not(contents = preceding-sibling::drawer/contents)]" />

but I'm not quite sure how then to count the number of elements which have $current_contents and the value of their content element. Nor can I see an easy way of then sorting by results so I can get the name of the most frequently encountered value of contents.

I have a feeling this is easier in XSLT 2.0 with its various group-by options, but unfortunately, xsltproc doesn't seem to support that. Any help would be gratefully received.

Many thanks,

Jacob

Upvotes: 1

Views: 1731

Answers (3)

annakata
annakata

Reputation: 75872

As with a great many problems solved in XSLT, I think your answer here is muenchian grouping. Group by whatever data you're interested in, a for-each against that will let you use xsl:sort and then do whatever you need to with the first result.

Untested, top-of-head, might-be-a-cleaner-way code:

<xsl:key name="average" match="desk/drawer/contents" use="text()"/>

<xsl:template match="/">
    <xsl:for-each select="desk/drawer/contents[generate-id() = generate-id(key('average',text())[1])]">     
        <xsl:sort select="count(//desk/drawer/contents[text()=current()])"  order="descending"/>
        <xsl:if test="position()=1">
            Most common value: "<xsl:value-of select="current()"/>" (<xsl:value-of select="count(//desk/drawer/contents[text()=current()])"/>)
        </xsl:if>       
    </xsl:for-each>
</xsl:template>

Upvotes: 2

Mario Menger
Mario Menger

Reputation: 5902

It's been a while, but I think something along these lines might work.

First count all contents

<xsl:variable name="tally">
  <xsl:for-each select="drawer">
     <contents count="{count(drawer[contents = current()/contents])}"><xsl:value-of select="contents"/></contents>
  </xsl:for-each>
</xsl:variable>

Note that the duplicated entries are counted each time, $tally would contain:

<contents count="2">pencils</contents>
<contents count="1">pens</contents>
<contents count="2">pencils</contents>
<contents count="1">rulers</contents>

Then use this to find one for which there is no other with a higher count:

<xsl:variable name="mostfrequentcontents" select="$tally/contents[not($tally/contents/@count > @count)]" />

Depending on your xslt processor you might have to convert $tally to a nodeset using a node-set function.

Upvotes: 0

Lucero
Lucero

Reputation: 60276

Sorting in the for-each is done via sort element. Just sort by the quantity and (if you only want the most frequent) add a <xsl:if test="position()=1"> tag to only get the first in the loop.

<xsl:for-each select="drawer">
   <xsl:sort select="quantity" data-type="number" order="descending"/>
   <xsl:if test="position()=1">
      Most frequent: <xsl:value-of select="contents"> with <xsl:value-of select="quantity"> items
   </xsl:if>
</xsl:for-each>

Upvotes: 0

Related Questions