bilak
bilak

Reputation: 4922

xslt deduplicate values by substring

I have following categories:

<categories>
    <category>anotherparent</category>
    <category>parent</category>
    <category>parent/child1</category>
    <category>parent/child1/subchild1</category>
    <category>parent/child2</category>
    <category>parent/child3/</category>
    <category>parent/child3/subchild3</category>
</categories>

Problem here is that the category path is "duplicated". Basically I'd like to remove all parent category paths and only include the most concrete level. So the result should be something like this:

<categories>
    <category>anotherparent</category>
    <category>parent/child1/subchild1</category>
    <category>parent/child2</category>
    <category>parent/child3/subchild3</category>
</categories>

I can think about some java extension, but I can't find proper method/function how to do this in xslt and I'm pretty sure it should be easy.

It could be xslt 2 or 3.

Upvotes: 0

Views: 57

Answers (2)

Martin Honnen
Martin Honnen

Reputation: 167471

Perhaps

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    expand-text="yes"
    exclude-result-prefixes="#all"
    xmlns:mf="http://example.com/mf"
    version="3.0">
  
  <xsl:function name="mf:group" as="element(category)*">
    <xsl:param name="cats"/>
    <xsl:param name="level"/>
    <xsl:choose>
      <xsl:when test="$cats?2[$level]">
        <xsl:for-each-group select="$cats[?2[$level]]" group-by="?2[$level]">
          <xsl:sequence select="mf:group(current-group(), $level + 1)"/>
        </xsl:for-each-group>
      </xsl:when>
      <xsl:otherwise>
        <xsl:sequence select="$cats?1"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:function>

  <xsl:mode on-no-match="shallow-copy"/>
  
  <xsl:output indent="yes"/>

  <xsl:template match="categories">
    <xsl:copy>
      <xsl:sequence select="mf:group(category ! [., tokenize(., '/')], 1)"/>
    </xsl:copy>
  </xsl:template>
  
</xsl:stylesheet>

helps; assumes, like the comment asks, that a trailing / in <category>parent/child3/</category> is a typo and would be <category>parent/child3</category>. If parent/child3/ can occur but should be treated as parent/child3 then use tokenize(., '/')[normalize-space()] instead of tokenize(., '/').

It might be cleaner to use a sequence of maps with two items in the function instead of a sequence of size 2 arrays:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    expand-text="yes"
    exclude-result-prefixes="#all"
    xmlns:mf="http://example.com/mf"
    version="3.0">
  
  <xsl:function name="mf:group" as="element(category)*">
    <xsl:param name="cats" as="map(xs:string, item()*)*"/>
    <xsl:param name="level" as="xs:integer"/>
    <xsl:choose>
      <xsl:when test="$cats?tokens[$level]">
        <xsl:for-each-group select="$cats[?tokens[$level]]" group-by="?tokens[$level]">
          <xsl:sequence select="mf:group(current-group(), $level + 1)"/>
        </xsl:for-each-group>
      </xsl:when>
      <xsl:otherwise>
        <xsl:sequence select="$cats?cat"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:function>

  <xsl:mode on-no-match="shallow-copy"/>
  
  <xsl:output indent="yes"/>

  <xsl:template match="categories">
    <xsl:copy>
      <xsl:sequence select="mf:group(category ! map { 'cat' : ., 'tokens' : tokenize(., '/') }, 1)"/>
    </xsl:copy>
  </xsl:template>
  
</xsl:stylesheet>

Again, it might be necessary to use tokenize(., '/')[normalize-space()] instead of tokenize(., '/') if trailing or leading or in between slashes can occur but should be ignored.

Upvotes: 1

Sebastien
Sebastien

Reputation: 2714

If you input XML is always in the format you posted, this works:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    exclude-result-prefixes="#all"
    version="3.0">

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="category[starts-with(following-sibling::category[1],.)]"/>

</xsl:stylesheet>

See it working here: https://xsltfiddle.liberty-development.net/gVrvcxY

Upvotes: 0

Related Questions