eoglasi
eoglasi

Reputation: 169

Rename duplicate attributes with xslt

I stumble upon one problem again today. I have xml with 1000 tags named book. Every tag has its own attribute, but some attributes are duplicated.

So i have XML:

... some other not duplicated attribute data ...
<book attribute="attr1"></book>
<book attribute="attr1"></book>
<book attribute="attr1"></book>
... some other not duplicated attribute data ...
<book attribute="attr2"></book>
<book attribute="attr2"></book>
<book attribute="attr2"></book>
... some other not duplicated attribute data ...

Is there a way with xslt so i can have attributes that are in xml more than once renamed:

... some other not duplicated attribute data...
<book attribute="attr1-1"></book>
<book attribute="attr1-2"></book>
<book attribute="attr1-3"></book>
... some other not duplicated attribute data ...
<book attribute="attr2-1"></book>
<book attribute="attr2-2"></book>
<book attribute="attr2-3"></book>
... some other not duplicated attribute data ...

Hope this is possible with xslt and that none duplicated attributes stay the same? Thanks a lot for all the answers, eoglasi

Upvotes: 2

Views: 710

Answers (3)

Mathias M&#252;ller
Mathias M&#252;ller

Reputation: 22617

Input XML:

<?xml version="1.0" encoding="utf-8"?>
<test>
 <book attribute="attr1"></book>
 <book attribute="attr1"></book>
 <book attribute="attr1"></book>
 <book attribute="attr2"></book>
 <book attribute="attr2"></book>
 <book attribute="attr2"></book>
 <book attribute="attr5"></book>
</test>

The following stylesheet should do the job. Essentially, it checks whether a group (grouping by the attribute which is called "attribute") consists of 1 item only (i.e. if the attribute value is unique).

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" indent="yes"/>

<xsl:template match="/">
  <xsl:apply-templates/>
</xsl:template>

<xsl:template match="test">
  <xsl:copy>
     <xsl:for-each-group select="book" group-by="@attribute">
        <xsl:choose>
           <xsl:when test="count(current-group()) = 1">
              <xsl:element name="book">
                 <xsl:attribute name="attribute">
                    <xsl:value-of select="@attribute"/>
                 </xsl:attribute>
              </xsl:element>
           </xsl:when>
           <xsl:otherwise>
              <xsl:for-each select="current-group()">
                 <xsl:element name="book">
                    <xsl:attribute name="attribute">
                       <xsl:value-of select="concat(current-grouping-key(), '-', position())"/>
                    </xsl:attribute>
                 </xsl:element>
              </xsl:for-each>
           </xsl:otherwise>
        </xsl:choose>
     </xsl:for-each-group>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

You get the following output (I have included 1 unique attribute value in the input file):

<test>
 <book attribute="attr1-1"/>
 <book attribute="attr1-2"/>
 <book attribute="attr1-3"/>
 <book attribute="attr2-1"/>
 <book attribute="attr2-2"/>
 <book attribute="attr2-3"/>
 <book attribute="attr5"/>
</test>

EDIT: Note that this will reorder non-adjacent book elements with the same attribute value.

Upvotes: 1

Ian Roberts
Ian Roberts

Reputation: 122394

One way to check whether an attribute is a duplicate is to define a key to look up book elements by their attribute value and then have special handling for the case where the key lookup gives you more than one result:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

  <xsl:key name="bookByAttribute" match="book" use="@attribute" />

  <xsl:template match="@*|node()">
    <xsl:copy><xsl:apply-templates select="@*|node()" /></xsl:copy>
  </xsl:template>

  <xsl:template match="book/@attribute[key('bookByAttribute', .)[2]]">
    <xsl:attribute name="attribute">
      <!-- logic to create a de-duplicated value -->
    </xsl:attribute>
  </xsl:template>
</xsl:stylesheet>

Books whose attribute is not repeated will not be affected by this template. The simplest way to generate de-duplicated values would be to use generate-id() directly, as Vincent suggests, but if you really need sequential numbers (and you can guarantee that this won't itself cause duplication, e.g. if the original document already has both foo and foo-1) then you could use a trick like

  <xsl:template match="book/@attribute[key('bookByAttribute', .)[2]]">
    <xsl:variable name="myId" select="generate-id(..)" />
    <xsl:attribute name="attribute">
      <xsl:value-of select="." />
      <xsl:text>-</xsl:text>
      <xsl:for-each select="key('bookByAttribute', .)">
        <xsl:if test="generate-id() = $myId">
          <xsl:value-of select="position()" />
        </xsl:if>
      </xsl:for-each>
    </xsl:attribute>
  </xsl:template>

The for-each is essentially finding the position in document order of the current book within the set of nodes that share the same attribute value.

Upvotes: 3

Vincent Biragnet
Vincent Biragnet

Reputation: 2998

If you're not bound with a specific pattern for your attributes, there's a dedicated function to create unique id for each specific node-set in the input file: generate-id. In your case, you may use it like that:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="book/@attribute">
        <xsl:attribute name="attribute">
            <xsl:value-of select="concat(., '-', generate-id())"/>
        </xsl:attribute>
    </xsl:template>
</xsl:stylesheet>

for this XML :

<test>
    ... some other not duplicated attribute data ...
    <book attribute="attr1"></book>
    <book attribute="attr1"></book>
    <book attribute="attr1"></book>
    ... some other not duplicated attribute data ...
    <book attribute="attr2"></book>
    <book attribute="attr2"></book>
    <book attribute="attr2"></book>
    ... some other not duplicated attribute data ...
</test>

you get something like:

<test>
    ... some other not duplicated attribute data ...
    <book attribute="attr1-d0e3_a0"/>
    <book attribute="attr1-d0e5_a1"/>
    <book attribute="attr1-d0e7_a2"/>
    ... some other not duplicated attribute data ...
    <book attribute="attr2-d0e9_a3"/>
    <book attribute="attr2-d0e11_a4"/>
    <book attribute="attr2-d0e13_a5"/>
    ... some other not duplicated attribute data ...
</test>

Upvotes: 1

Related Questions