Azaghal
Azaghal

Reputation: 430

XSL: Only copy nodes which have a duplicate attribute value in a large XML file

Here is the type of input I have:

<modifs>
   <test>
      <ref id="1"/>
      <ref id="2"/>
      <ref id="800"/>
      <ref id="5000"/>
      <ref id="10000"/>
      <ref id="40000"/>
   </test>
   <modif id="1">content1</modif>
   <modif id="2">content2</modif>
   <modif id="3">content3</modif>
   <modif id="4">content4</modif>
</modifs>

In this input, the first list "ref" is just a list of attributes, and the second list "modif" is what really matters to me. What I would want to do is to display only the "modif" nodes which have a matching "id" attribute in the "ref" list: here what I would get is just this:

<modif>
    <modif id="1">content1</modif>
    <modif id="2">content2</modif>
</modifs>

I made an attempt to deal with this, which was partly successful. What I did was to check, for every @id of a "modif" node if there was a preceding matching "ref"@id:

  <xsl:template match="modifs">
    <xsl:for-each select="./modif">
      <xsl:if test="./@id=preceding::ids/id">
        <xsl:element name="{local-name()}">
          <xsl:for-each select="attribute::*">
            <xsl:attribute name="{local-name()}">
              <xsl:value-of select="."/>
            </xsl:attribute>
          </xsl:for-each>
      <xsl:value-of select="."/>
      </xsl:element>
      </xsl:if>
    </xsl:for-each>
  </xsl:template>

However, I'm currently working on a very large XML file (200k+ ids), and this method is not time efficient at all (works well for the first thousand ids, takes hours/days after that). I'm pretty sure there's a way around it (with muenchian grouping?) but I really don't understand what I'm supposed to do...

If someone could give me a hand / explain how I should proceed, that would be great.

Thanks

Upvotes: 0

Views: 52

Answers (1)

Tim C
Tim C

Reputation: 70648

You don't need muenchian grouping here, although you could do with using a key to look up whether the ref elements exist

  <xsl:key name="ref" match="ref" use="@id" />

Then, used in conjunction with the XSLT identity template, you just need a template to match the modif elements with no corresponding ref element

<xsl:template match="modif[not(key('ref', @id))]"/>

Try this XSLT instead:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes" />
  <xsl:strip-space elements="*"/>
  <xsl:key name="ref" match="ref" use="@id" />

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="modif[not(key('ref', @id))]"/>

  <xsl:template match="test" />
</xsl:stylesheet>

Upvotes: 1

Related Questions