samsulfreude
samsulfreude

Reputation: 99

Combine multiple XML files using XSLT and do some math function

I have an XML file authors.xml as below:

<?xml version="1.0" encoding="ISO-8859-1"?>
<authors>
  <author>
    <name>Leonardo da Vinci</name>
    <nationality>Italian</nationality>
  </author>
  <author>
    <name>Pablo Picasso</name>
    <nationality>Spanish</nationality>
  </author>
</authors>

and another file listing their artworks artworks.xml as below:

<?xml version="1.0" encoding="ISO-8859-1"?>
<artworks>
  <artwork>
    <title>Mona Lisa</title>
    <author>Leonardo da Vinci</author>
    <date>1497</date>
    <form>painting</form>
  </artwork>
  <artwork>
    <title>Vitruvian Man</title>
    <author>Leonardo da Vinci</author>
    <date>1499</date>
    <form>painting</form>
  </artwork>
  <artwork>
    <title>Absinthe Drinker</title>
    <author>Pablo Picasso</author>
    <date>1479</date>
    <form>painting</form>
  </artwork>
  <artwork>
    <title>Chicago Picasso</title>
    <author>Pablo Picasso</author>
    <date>1950</date>
    <form>sculpture</form>
  </artwork>
</artworks>

What I wish to do is combine these 2 XML files into another processed XML file. The XSLT will list down all authors, and within it list all the artworks associated with that particular author and group it by artwork form. The XSLT will also count the number of artwork groups. The duration of the group is also added as an element attribute. This is further illustrated as in the XML file below:

<?xml version="1.0" encoding="UTF-8" ?>
<authors>
  <author>
    <name>Leonardo da Vinci</name>
    <nationality>Italian</nationality>
    <artworks form="painting" duration="1497-1499" quantity="2">
      <artwork date="1497">
        <title>Mona Lisa</title>
      </artwork>
      <artwork date="1499">
        <title>Vitruvian Man</title>
      </artwork>
    </artworks>
  </author>
  <author>
    <name>Pablo Picasso</name>
    <nationality>Spanish</nationality>
    <artworks form="painting" duration="1479-1479" quantity="1">
      <artwork date="1479">
        <title>Absinthe Drinker</title>
      </artwork>
    </artworks>
    <artworks form="sculpture" duration="1950-1950" quantity="1">
      <artwork date="1950">
        <title>Chicago Picasso</title>
      </artwork>
    </artworks>
  </author>
</authors>

I am still new to this. What I've managed to do is get all the author part, and now I'm not sure how to extract the data from that other XML file while also counting the occurrence of artworks and so on. I am very experienced in procedural programming such as C or C++, but this method of declarative programming is really turning my head upside down! Hopefully someone can point me in the right direction so that I can get this right.

Upvotes: 2

Views: 2326

Answers (1)

helderdarocha
helderdarocha

Reputation: 23627

This stylesheet will generate the output you expect, using the authors.xml file as the input source, and having the artworks.xml in the same directory:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output indent="yes"/>

    <xsl:variable name="artworks" select="doc('artworks.xml')/artworks"/>

    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="authors/author">
        <xsl:copy>
            <xsl:copy-of select="name"/>
            <xsl:copy-of select="nationality"/>
            <xsl:for-each-group 
                select="$artworks/artwork[author=current()/name]"
                group-by="form">

                <artworks form="{form}" 
                    duration="{min(current-group()/date)}-{max(current-group()/date)}" 
                    quantity="{count(current-group())}">
                    <xsl:apply-templates select="current-group()"/>
                </artworks>

            </xsl:for-each-group>  
        </xsl:copy>
    </xsl:template>

    <xsl:template match="artwork">
        <artwork date="{date}">
            <title><xsl:value-of select="title"/></title>
        </artwork>
    </xsl:template>

</xsl:stylesheet>

Here is an explanation of the code above:

I used a xsl:variable to refer to the artworks subtree from the imported document:

<xsl:variable name="artworks" select="doc('artworks.xml')/artworks"/>

This template is an identity transform, which will match any node and atribute and copy it to the output. It has lower precedence than the other two templates so it will only be called if the others aren't matched:

<xsl:template match="node()|@*">
    <xsl:copy>
        <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
</xsl:template>

The second template must match authors/author (and not author, since it is called when processing both documents, and there is another author inside artwork). The copy-of element copies the entire subtree (elements, content and attributes) for the selected nodes.

<xsl:template match="authors/author">
    <xsl:copy>
        <xsl:copy-of select="name"/>
        <xsl:copy-of select="nationality"/>
        ...
    </xsl:copy>
</xsl:template>

The for-each-group iterates on each artwork element from the artworks.xml file which has the same name as the author element of the current node from the input document (authors.xml). It is being grouped by form. You refer to the current group using current-group() which you need to calculate the max and min dates, to count the quantity and to print the <artwork> nodes.

<xsl:for-each-group 
    select="$artworks/artwork[author=current()/name]"
    group-by="form">

    <artworks form="{form}" 
        duration="{min(current-group()/date)}-{max(current-group()/date)}" 
        quantity="{count(current-group())}">
        <xsl:apply-templates select="current-group()"/>
    </artworks>

</xsl:for-each-group> 

Finally, this template formats each artwork node:

<xsl:template match="artwork">
    <artwork date="{date}">
        <title><xsl:value-of select="title"/></title>
    </artwork>
</xsl:template>

You could do all this differently, in a single root / matching template, and several nested for-each blocks, but using templates is a much better practice when coding in XSLT.

Upvotes: 6

Related Questions