Wokoman
Wokoman

Reputation: 1119

merge node only if all attributes are equal

struggling a bit to get the following to work : I'm trying to merge translated nodes, but since sometimes there are minor differences between the node sets I can't do this just blindfolded and manual review is required. However, at the same time I like to keep my life simple so I want to automate up front as much as possibe. Take below example :

<root>
<chapter>
<string class="l1"><local xml:lang="en">Some English here</local></string>
<string class="p"><local xml:lang="en">Some other English here</local></string>
<string class="p"><local xml:lang="en">and some English here</local></string>
<string class="p"><local xml:lang="en">Some English here</local></string>
</chapter>
<chapter>
<string class="l1"><local xml:lang="fr">Some English translated to French here</local></string>
<string class="p"><local xml:lang="fr">Some other English translated to French here</local></string>
<string class="p"><local xml:lang="fr">and some English translated to French here</local></string>
<string class="p"><local xml:lang="fr">Some English translated to French here</local></string>
</chapter>
<chapter>
<string class="l1"><local xml:lang="de">Some English translated to German here</local></string>
<string class="p"><local xml:lang="de">Some other English translated to German here</local></string>
<string class="another_class"><local xml:lang="de">and some English translated to German here</local></string>
<string class="p"><local xml:lang="de">Some English translated to German here</local></string>
</chapter>
<chapter>
<string class="l1"><local xml:lang="nl">Some English translated to Dutch here</local></string>
<string class="p"><local xml:lang="nl">Some other English translated to Dutch here</local></string>
<string class="p"><local xml:lang="nl">and some English translated to Dutch here<br/>Some English translated to Dutch here</local></string>
</chapter>
</root>

The actual files can contain 30 languages and hundreds of nodes, so above example is very simplified.

What I want to achieve with the example is to merge English and French, because they have both equal amount of elements, and all the attributes are equal also. French should remain as is because not all attributes match, Dutch should remain as is since the amount of elements doesn't match.

So output should look like this :

<root>
<!-- French has the same amount of elements, and a full sequential match of attributes, so we can merge -->
<chapter>
<string class="l1">
    <local xml:lang="en">Some English here</local>
    <local xml:lang="fr">Some English translated to French here</local>
</string>
<string class="p">
    <local xml:lang="en">Some other English here</local>
    <local xml:lang="fr">Some other English translated to French here</local>
</string>
<string class="p">
    <local xml:lang="en">and some English here</local>
    <local xml:lang="fr">and some English translated to French here</local>
</string>
<string class="p">
    <local xml:lang="en">Some English here</local>
    <local xml:lang="fr">Some English translated to French here</local>
</string>
</chapter>
<!-- German has same amount of elements, but different tag sequence, so we leave it for manual review -->
<chapter>
<string class="l1"><local xml:lang="de">Some English translated to German here</local></string>
<string class="p"><local xml:lang="de">Some other English translated to German here</local></string>
<string class="another_class"><local xml:lang="de">and some English translated to German here</local></string>
<string class="p"><local xml:lang="de">Some English translated to German here</local></string>
</chapter>
<!-- Dutch has same same tag sequence but less elements, so we leave it for manual review-->
<chapter>
<string class="l1"><local xml:lang="nl">Some English translated to Dutch here</local></string>
<string class="p"><local xml:lang="nl">Some other English translated to Dutch here</local></string>
<string class="p"><local xml:lang="nl">and some English translated to Dutch here<br/>Some English translated to Dutch here</local></string>
</chapter>
</root>

English is always the master reference, so I can already exclude the nodesets that are of difference size by using the English nodecount as comparison, just have no clue on how to check if all the attribute values are equal also.

Any advice ? (using xslt2)

Thanks !

Upvotes: 1

Views: 249

Answers (2)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243459

This transformation:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:variable name="vENSignature" select="string-join(/*/*[1]/*/@class, '+')"/>
 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="/*">
  <root>
   <xsl:for-each-group select="chapter"
    group-adjacent="string-join(*/@class, '+') eq $vENSignature">
     <xsl:choose>
       <xsl:when test="current-grouping-key() eq true()">
             <chapter>
              <xsl:apply-templates select="*"/>
            </chapter>
        </xsl:when>
        <xsl:otherwise>
          <xsl:sequence select="current-group()"/>
        </xsl:otherwise>
    </xsl:choose>
   </xsl:for-each-group>
  </root>
 </xsl:template>

 <xsl:template match="chapter/*" >
  <xsl:variable name="vPos" select="position()"/>
  <xsl:copy>
    <xsl:sequence select="@*, current-group()/*[position() = $vPos]/*"/>
  </xsl:copy>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<root>
    <chapter>
        <string class="l1">
            <local xml:lang="en">Some English here</local>
        </string>
        <string class="p">
            <local xml:lang="en">Some other English here</local>
        </string>
        <string class="p">
            <local xml:lang="en">and some English here</local>
        </string>
        <string class="p">
            <local xml:lang="en">Some English here</local>
        </string>
    </chapter>
    <chapter>
        <string class="l1">
            <local xml:lang="fr">Some English translated to French here</local>
        </string>
        <string class="p">
            <local xml:lang="fr">Some other English translated to French here</local>
        </string>
        <string class="p">
            <local xml:lang="fr">and some English translated to French here</local>
        </string>
        <string class="p">
            <local xml:lang="fr">Some English translated to French here</local>
        </string>
    </chapter>
    <chapter>
        <string class="l1">
            <local xml:lang="de">Some English translated to German here</local>
        </string>
        <string class="p">
            <local xml:lang="de">Some other English translated to German here</local>
        </string>
        <string class="another_class">
            <local xml:lang="de">and some English translated to German here</local>
        </string>
        <string class="p">
            <local xml:lang="de">Some English translated to German here</local>
        </string>
    </chapter>
    <chapter>
        <string class="l1">
            <local xml:lang="nl">Some English translated to Dutch here</local>
        </string>
        <string class="p">
            <local xml:lang="nl">Some other English translated to Dutch here</local>
        </string>
        <string class="p">
            <local xml:lang="nl">and some English translated to Dutch here
                <br/>Some English translated to Dutch here
            </local>
        </string>
    </chapter>
</root>

produces the wanted, correct result:

<root>
   <chapter>
      <string class="l1">
         <local xml:lang="en">Some English here</local>
         <local xml:lang="fr">Some English translated to French here</local>
      </string>
      <string class="p">
         <local xml:lang="en">Some other English here</local>
         <local xml:lang="fr">Some other English translated to French here</local>
      </string>
      <string class="p">
         <local xml:lang="en">and some English here</local>
         <local xml:lang="fr">and some English translated to French here</local>
      </string>
      <string class="p">
         <local xml:lang="en">Some English here</local>
         <local xml:lang="fr">Some English translated to French here</local>
      </string>
   </chapter>
   <chapter>
            <string class="l1">
                  <local xml:lang="de">Some English translated to German here</local>
            </string>
            <string class="p">
                  <local xml:lang="de">Some other English translated to German here</local>
            </string>
            <string class="another_class">
                  <local xml:lang="de">and some English translated to German here</local>
            </string>
            <string class="p">
                  <local xml:lang="de">Some English translated to German here</local>
            </string>
      </chapter>
   <chapter>
            <string class="l1">
                  <local xml:lang="nl">Some English translated to Dutch here</local>
            </string>
            <string class="p">
                  <local xml:lang="nl">Some other English translated to Dutch here</local>
            </string>
            <string class="p">
                  <local xml:lang="nl">and some English translated to Dutch here
                <br/>Some English translated to Dutch here
            </local>
            </string>
      </chapter>
</root>

Explanation:

  1. We define and use a "signature" property of a chapter -- that is the sequence of the class attribute values of its children.

  2. We group all chapter elements based on the fact whether or not their signature is equal to the "english signature".

  3. We merge the chapter elements in the group whose signature is equal to the "english signature".

  4. We copy unchanged the chapter elements in the other group.

Upvotes: 0

Martin Honnen
Martin Honnen

Reputation: 167471

Here is a sample XSLT 2.0 stylesheet:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:variable 
  name="master" 
  select="root/chapter[string/local/@xml:lang = 'en']"/>


<xsl:variable 
  name="matches" 
  select="root/chapter[not(string/local/@xml:lang = 'en')]
    [count(string) eq count($master/string)
     and 
      (every $i in (1 to count($master/string))
       satisfies $master/string[$i]/@class eq string[$i]/@class)]"/>

<xsl:template match="@* | node()">
  <xsl:copy>
    <xsl:apply-templates select="@* , node()"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="chapter[. intersect $master]">
  <xsl:copy>
    <xsl:apply-templates select="string"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="string[local/@xml:lang = 'en']">
  <xsl:variable name="pos" select="position()"/>
  <xsl:copy>
    <xsl:apply-templates select="@* | local | $matches/string[$pos]/local"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="chapter[. intersect $matches]"/>

</xsl:stylesheet>

When I apply that with Saxon 9.4 to your posted input I get the result

<root>
   <chapter>
      <string class="l1">
         <local xml:lang="en">Some English here</local>
         <local xml:lang="fr">Some English translated to French here</local>
      </string>
      <string class="p">
         <local xml:lang="en">Some other English here</local>
         <local xml:lang="fr">Some other English translated to French here</local>
      </string>
      <string class="p">
         <local xml:lang="en">and some English here</local>
         <local xml:lang="fr">and some English translated to French here</local>
      </string>
      <string class="p">
         <local xml:lang="en">Some English here</local>
         <local xml:lang="fr">Some English translated to French here</local>
      </string>
   </chapter>
   <chapter>
      <string class="l1">
         <local xml:lang="de">Some English translated to German here</local>
      </string>
      <string class="p">
         <local xml:lang="de">Some other English translated to German here</local>
      </string>
      <string class="another_class">
         <local xml:lang="de">and some English translated to German here</local>
      </string>
      <string class="p">
         <local xml:lang="de">Some English translated to German here</local>
      </string>
   </chapter>
   <chapter>
      <string class="l1">
         <local xml:lang="nl">Some English translated to Dutch here</local>
      </string>
      <string class="p">
         <local xml:lang="nl">Some other English translated to Dutch here</local>
      </string>
      <string class="p">
         <local xml:lang="nl">and some English translated to Dutch here<br/>Some English translated to Dutch here</local>
      </string>
   </chapter>
</root>

Upvotes: 1

Related Questions