BRiesenberg
BRiesenberg

Reputation: 63

merge xml - bring element names from doc b into doc a as attribute values, match on child text (or position?)

👋 Hello and thanks in advance for any advice!

XML A
Metadata export a
Element names are custom, reflecting local field names
Element child text identical to B in content, document order

<metadata>
   <record>
      <Title>Untitled</Title>
      <Photographer>Gordon Parks</Photographer>
      <Notes>An important photograph because (...)</Notes>
   </record>
   ...
</metadata>

XML B
Metadata export b
Element names reflect configured mapping to Dublin Core Elements/Terms
Element child text identical to A in content, document order

<metadata>
   <record>
      <title>Untitled</title>
      <creator>Gordon Parks</creator>
      <description>An important photograph because (...)</description>
   </record>
   ...
</metadata>

Desired output
Use local field names as element names
Capture DC Elements/Terms as @dc values

<metadata>
   <record>
      <Title dc="title">Untitled</Title>
      <Photographer dc="creator">Gordon Parks</Photographer>
      <Notes dc="description">An important photograph because (...)</Notes>
   </record>
   ...
</metadata>

Stylesheet as of now

    <xsl:template match="/">
        <metadata>
            <xsl:for-each select="XML_A/metadata/record">
                <record>
                    <xsl:for-each select="node()">
                        <xsl:choose>
                            <xsl:when test="name() != ''">
                            <!-- minor issue above: without this I believe I was selecting whitespace and/or other nodes...
                            ...ERROR description: "Supplied element name is a zero-length string" -->
                                <xsl:element name="{name()}">
                                    <!-- ACK -->
                                    <xsl:value-of select="."/>
                                </xsl:element>
                            </xsl:when>
                            <xsl:otherwise/>
                        </xsl:choose>
                    </xsl:for-each>
                </record>
            </xsl:for-each>
        </metadata>
    </xsl:template>

Regarding <!-- ACK -->

As I say above, I believe that the sequence nodes with identical child text is the same in A and B. Thus, for each child node of each record in A, I think that I could use either position() or text() to match the corresponding node in B. But ...

I've tried implementing a key to match the desired metadata/record between A and B (a given ID element value, not shown in examples for XML A and B, could be used to match records).

<xsl:key name="match_xml_b" match="record" use="b_id">
...

<xsl:attribute name="dc"
   select="key('match_xml_b', a_id, document('XMLB.xml')/[text() = $a_text]/name()/>

...or...

<xsl:attribute name="dc"
   select="key('match_xml_b', a_id, document('XMLB.xml')/[position() = $a_position]/name()/>

I don't think that my syntax is correct for selecting the child node of record where text content matches text content from the current node in A (or where position matches that of the current node in A). Additionally, I'm unsure what XPath syntax to use to select the element name in B, which is what I need in my desired output.

I've also tried some clumsy matching without a key, along the lines of...

<xsl:attribute name="dc" 
   select="document('XMLB.xml')/metadata/record[b_id = a_id]/[position() = $a_position]
   (: how to use name() here? :)"/>

or...

<xsl:attribute name="dc" 
   select="document('XMLB.xml')/metadata/record[b_id = a_id]/[text() = $a_text]
   (: how to use name() here? :)"/>

...unsuccessfully.

My difficulties here include syntax for matching on a child element of record using either position() or text(), as well as retrieving the name of the element once matched.

Upvotes: 0

Views: 56

Answers (2)

Martin Honnen
Martin Honnen

Reputation: 167446

If both content and position matter you could consider a composite key e.g.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="#all"
  expand-text="yes">
  
  <xsl:template match="metadata/record/*">
    <xsl:variable name="pos" as="xs:integer">
      <xsl:number/>
    </xsl:variable>
    <xsl:copy>
      <xsl:attribute name="dc" select="key('meta-ref', ($pos, string()), $metadata-doc)/local-name()"/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:output method="xml" indent="no"/>

  <xsl:mode on-no-match="shallow-copy"/>
  
  <xsl:key name="meta-ref" match="metadata/record/*" composite="yes">
    <xsl:variable name="pos" as="xs:integer">
      <xsl:number/>
    </xsl:variable>
    <xsl:sequence select="$pos, string()"/>
  </xsl:key>
  
  <!-- inlined for testing, use param name="metadata-doc" select="doc('filename.xml')" in your code -->
  <xsl:param name="metadata-doc">
<metadata>
   <record>
      <title>Untitled</title>
      <creator>Gordon Parks</creator>
      <description>An important photograph because (...)</description>
   </record>
   ...
</metadata>    
  </xsl:param>

</xsl:stylesheet>

Upvotes: 2

y.arazim
y.arazim

Reputation: 3162

If it's permissible to link the elements by matching text, you could do something similar to:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>

<xsl:param name="xmlB" select="document('XMLB.xml')"/>

<xsl:key name="keyB" match="*" use="text()" />

<xsl:template match="/metadata">
    <metadata>
        <xsl:for-each select="record">
            <record>
                <xsl:for-each select="*">
                    <xsl:copy>
                        <xsl:attribute name="dc" select="key('keyB', text(), $xmlB)/name()"/>
                        <xsl:apply-templates/>
                    </xsl:copy>
                </xsl:for-each>
            </record>
        </xsl:for-each>
    </metadata>
</xsl:template>

</xsl:stylesheet> 

This requires XSLT 2.0 or higher.


If there are multiple records with identical structure, it might be more efficient to perform the lookup once for the first record, and use the results for all the rest? Just a thought.

Upvotes: 2

Related Questions