user8002385
user8002385

Reputation:

Merge info from two xml files in one, using xslt

File a.xml:

<?xml version="1.0" encoding="UTF-8"?>
<TABLE NAME="pivot.cs">
   <DATA RECORDS="2">
      <RECORD ID="1">
         <INTERNALID>5510</INTERNALID>
         <SOMED>1</SOMED>
         <PEMED>1</PEMED>
         <CODAL>PLACEHOLD</CODAL>
      </RECORD>
      <RECORD ID="2">
         <INTERNALID>5511</INTERNALID>
         <SOMED>1</SOMED>
         <PEMED>1</PEMED>
         <CODAL>PLACEHOLD</CODAL>
      </RECORD>
      <INTERNALID>5537</INTERNALID>
      <SOMED>1</SOMED>
      <PEMED>1</PEMED>
      <CODAL>PLACEHOLD</CODAL>
   </DATA>
</TABLE>

file b.xml:

    <?xml version="1.0" encoding="UTF-8"?>
<TABLE NAME="ALT.CS">
   <DATA RECORDS="20">
      <RECORD ID="53">
         <RECNO>5510</RECNO>
         <TOBEEXTRACTED>TIM</TOBEEXTRACTED>
      </RECORD>
      <RECORD ID="53">
         <RECNO>5510</RECNO>
         <TOBEEXTRACTED>KLM</TOBEEXTRACTED>
      </RECORD>
      <RECORD ID="54">
         <RECNO>5510</RECNO>
         <TOBEEXTRACTED>KAB</TOBEEXTRACTED>
      </RECORD>
      <RECORD ID="55">
         <RECNO>5511</RECNO>
         <TOBEEXTRACTED>BUS WEE</TOBEEXTRACTED>
      </RECORD>
      <RECORD ID="59">
         <RECNO>5512</RECNO>
      </RECORD>
      <RECORD ID="60">
         <RECNO>5513</RECNO>
         </RECORD>
         <RECORD ID="5511">
            <RECNO>5598</RECNO>
            <TOBEEXTRACTED>FBV</TOBEEXTRACTED>
         </RECORD>
      </RECORD>
   </DATA>
</TABLE>

and output file should be, the file a.xml, but with the TOBEEXTRACTED element text appended into [], if matched one or two times:

<?xml version="1.0" encoding="UTF-8"?>
<TABLE NAME="pivot.cs">
   <DATA RECORDS="2">
      <RECORD ID="1">
         <INTERNALID>5510</INTERNALID>
         <SOMED>1</SOMED>
         <PEMED>1</PEMED>
         <CODAL>PLACEHOLD</CODAL>
      </RECORD>
      <RECORD ID="2">
         <INTERNALID>5511</INTERNALID>
         <SOMED>1</SOMED>
         <PEMED>1</PEMED>
         <CODAL>PLACEHOLD [BUS WEE]</CODAL>
      </RECORD>
      <INTERNALID>5537</INTERNALID>
      <SOMED>1</SOMED>
      <PEMED>1</PEMED>
      <CODAL>PLACEHOLD</CODAL>
   </DATA>
</TABLE>

Also, it would be of much help, if we could have a txt file as output, that would have the following info: from file a.xml,

INTERNALID: 5511 (and all the rest in a normal xml file) was matched.
INTERNALID: 5510 was matched more than two times, so no join took place.
INTERNALID: 5537 did not match
RECNO 5512 did not have a TOBEEXTRACTED element.

Upvotes: 0

Views: 655

Answers (2)

Martin Honnen
Martin Honnen

Reputation: 167716

If you use a key as suggested in a comment you can reference and match elements as follows:

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

    <xsl:param name="doc2">
        <TABLE NAME="ALT.CS">
   <DATA RECORDS="20">
      <RECORD ID="53">
         <RECNO>5510</RECNO>
         <TOBEEXTRACTED>TIM</TOBEEXTRACTED>
      </RECORD>
      <RECORD ID="53">
         <RECNO>5510</RECNO>
         <TOBEEXTRACTED>KLM</TOBEEXTRACTED>
      </RECORD>
      <RECORD ID="54">
         <RECNO>5510</RECNO>
         <TOBEEXTRACTED>KAB</TOBEEXTRACTED>
      </RECORD>
      <RECORD ID="55">
         <RECNO>5511</RECNO>
         <TOBEEXTRACTED>BUS WEE</TOBEEXTRACTED>
      </RECORD>
      <RECORD ID="59">
         <RECNO>5512</RECNO>
      </RECORD>
      <RECORD ID="60">
         <RECNO>5513</RECNO>
         </RECORD>
         <RECORD ID="5511">
            <RECNO>5598</RECNO>
            <TOBEEXTRACTED>FBV</TOBEEXTRACTED>
         </RECORD>

   </DATA>
</TABLE>
    </xsl:param>

    <xsl:key name="ref" match="DATA/RECORD[TOBEEXTRACTED]" use="RECNO"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="DATA/RECORD[key('ref', INTERNALID, $doc2)]/CODAL">
        <xsl:copy>
            <xsl:apply-templates select="node(), key('ref', ../INTERNALID, $doc2)/TOBEEXTRACTED"/>  
        </xsl:copy>
    </xsl:template>

    <xsl:template match="DATA/RECORD[not(key('ref', INTERNALID, $doc2))]"/>

    <xsl:template match="TOBEEXTRACTED">
        <xsl:value-of select="concat(' [', ., ']')"/>
    </xsl:template>

</xsl:transform>

That gives the output you have posted at http://xsltransform.net/a9Giwy. There I have used an xsl:param name="doc2" with inline contents but you can of course use <xsl:param name="doc2" select="doc('fileb.xml')"/> instead.

As in an edit the question was additionally tagged as I have also tried to implement it using the xsl:merge instruction of that version:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    exclude-result-prefixes="xs math"
    version="3.0">

    <xsl:param name="doc2-uri" as="xs:string" select="'test201705120102.xml'"/>

    <xsl:mode on-no-match="shallow-copy"/>

    <xsl:output indent="yes"/>

    <xsl:template match="TABLE/DATA">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:merge>
                <xsl:merge-source name="internal" select="RECORD" >
                    <xsl:merge-key select="INTERNALID"/>
                </xsl:merge-source>
                <xsl:merge-source name="recno" select="doc($doc2-uri)//RECORD">
                    <xsl:merge-key select="RECNO"/>
                </xsl:merge-source>
                <xsl:merge-action>
                    <xsl:if test="current-merge-group('internal') and current-merge-group('recno')">
                        <xsl:copy>
                            <xsl:copy-of select="@*, * except CODAL"/>
                            <CODAL>
                                <xsl:value-of select="CODAL, current-merge-group('recno')/TOBEEXTRACTED/('[' || . || ']')"/>
                            </CODAL>
                        </xsl:copy>
                    </xsl:if>
                </xsl:merge-action>
            </xsl:merge>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

Upvotes: 0

Michael Kay
Michael Kay

Reputation: 163595

This kind of merging can often be accomplished using xsl:for-each-group:

<xsl:for-each-group select="$doc1//REC, $doc2//REC" group-by="RECNO">
  ...
</xsl:for-each-group>

in the body, current-group() holds the records from both files with the required key. You can separate them out with, for example

<xsl:variable name="doc1rec" select="current-group()[(/) is $doc1]"/>
<xsl:variable name="doc2rec" select="current-group()[(/) is $doc2]"/>

and then the remaining processing should be straightforward if you understand the logic (which I don't).

Upvotes: 0

Related Questions