ABach
ABach

Reputation: 3738

Building out missing parts of XML via XSLT

XSLT Version: 1.0

Data (how it is "rendered"):

Rendered Data

Data (how it is stored as XML):

<data>
  <item>
    <row>Row1</row>
    <col>Col2</col>
    <value>323</value>
  </item>
  <item>
    <row>Row2</row>
    <col>Col1</col>
    <value>12</value>
  </item>
  <item>
    <row>Row2</row>
    <col>Col2</col>
    <value>53</value>
  </item>
</data>

Note how the empty "cell" (Row1/Col1) is completely missing from the XML data.

What I Need:

I need to fill out the rest of the "structure" such that empty "cells" have corresponding empty elements in the XML:

<data>
  <!-- New, "empty" item gets created -->
  <item>
    <row>Row1</row>
    <col>Col1</col>
    <value />
  </item>
  <!-- Output the others as before -->
  <item>
    <row>Row1</row>
    <col>Col2</col>
    <value>323</value>
  </item>
  <item>
    <row>Row2</row>
    <col>Col1</col>
    <value>12</value>
  </item>
  <item>
    <row>Row2</row>
    <col>Col2</col>
    <value>53</value>
  </item>
</data>

The Catch:

This sample data is much, much smaller than my target data set. The real data might have hundreds of rows and columns with empty "cells" all over the place. Therefore, I can't hardcode anything.

My "Solution" Thus Far:

I've considered using Muenchian Grouping to pick out all unique column and row names; then, having those, I would traverse each combination (Row1/Col1, Row2/Col2, etc.) and check for the existence of an <item> element with those values in the source document. Should I find one, I copy it (along with its descendants); should I not find one, I output the appropriate "empty" elements.

That sounds too procedural to me (such that I'm having a hard time even starting an XSLT document). There has to be a better way.

I appreciate any pointers you can give. :)

UPDATE:

Unfortunately, the solution cannot count on the rows and columns having sequential numbers in their values; they are merely presented this way for ease-of-demonstration. For instance, instead of "Row2", the value of the first column for that row might as well be "Peanut Butter and Jelly".

<item> elements are arrayed sequentially in the source XML: left to right (by column), top to bottom (by row).

Upvotes: 5

Views: 1723

Answers (2)

nine9ths
nine9ths

Reputation: 795

Here's a stylesheet that will do something along the lines of what you proposed, note though that the order the table is created in is dependent upon the input and could change based on the which data is missing.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:exsl="http://exslt.org/common"
  exclude-result-prefixes="xs exsl"
  version="1.0">

  <xsl:strip-space elements="*"/>

  <xsl:output indent="yes"/>

  <xsl:variable name="doc" select="/"/>

  <xsl:key name="rows" match="row" use="."/>

  <xsl:variable name="rows">
    <xsl:for-each select="//row[generate-id() = generate-id(key('rows', .)[1])]">
      <xsl:copy-of select="."/>
    </xsl:for-each>
  </xsl:variable>

  <xsl:key name="cols" match="col" use="."/>

  <xsl:variable name="cols">
    <xsl:for-each select="//col[generate-id() = generate-id(key('cols', .)[1])]">
      <xsl:copy-of select="."/>
    </xsl:for-each>
  </xsl:variable>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:key name="by.rowcol" match="item" use="concat(row,col)"/>

  <xsl:template match="data">
    <xsl:copy>
      <xsl:for-each select="exsl:node-set($rows)/row">
        <xsl:variable name="row" select="."/>
        <xsl:for-each select="exsl:node-set($cols)/col">
          <xsl:variable name="col" select="."/>
          <xsl:for-each select="$doc">
            <xsl:choose>
              <xsl:when test="key('by.rowcol',concat($row,$col))">
                <xsl:copy-of select="key('by.rowcol',concat($row,$col))"/>
              </xsl:when>
              <xsl:otherwise>
                <item>
                  <xsl:copy-of select="$row"/>
                  <xsl:copy-of select="$col"/>
                  <value/>
                </item>
              </xsl:otherwise>
            </xsl:choose>
          </xsl:for-each>
        </xsl:for-each>
      </xsl:for-each>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Alternatively, here's a stylesheet that will do what you want by iterating over the elements serially if the row and col values are numeric:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs"
  version="1.0">

  <xsl:strip-space elements="*"/>

  <xsl:output indent="yes"/>

  <!-- Figure out how wide the table is -->
  <xsl:variable name="max.col">
    <xsl:for-each select="//col">
      <xsl:sort select="substring-after(.,'Col')" data-type="number" order="descending"/>
      <xsl:if test="position() = 1">
        <xsl:value-of select="substring-after(.,'Col')"/>
      </xsl:if>
    </xsl:for-each>
  </xsl:variable>

  <!-- The identity template -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="data">
    <xsl:copy>
      <!-- Start off processing the first time in the first row -->
      <xsl:apply-templates select="item[row = 'Row1'][1]">
        <!-- We expect the coordinates to be (1,1) -->
        <xsl:with-param name="expected.row" select="1"/>
        <xsl:with-param name="expected.col" select="1"/>
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="item">
    <xsl:param name="expected.row"/>
    <xsl:param name="expected.col"/>

    <!-- Figure out what coordinates this item is at -->
    <xsl:variable name="row" select="substring-after(row,'Row')"/>
    <xsl:variable name="col" select="substring-after(col,'Col')"/>

    <!-- Check to see if we're the last item in the row -->
    <xsl:variable name="is.last-in-row" select="not(following-sibling::item[row = current()/row])"/>

    <!-- Check to see if we skipped any rows -->
    <xsl:if test="$row > $expected.row">
      <!-- Call a template to recursively create the skipped rows of item -->
      <xsl:call-template name="fill.row">
        <xsl:with-param name="row" select="$expected.row"/>
        <xsl:with-param name="stop.row" select="$row - 1"/>
      </xsl:call-template>
    </xsl:if>

    <!-- We're further along than we expected that means some item were missed -->
    <xsl:if test="$col > $expected.col">
      <!-- Call a template to recursively create the skipped item -->
      <xsl:call-template name="fill.col">
        <xsl:with-param name="row" select="$row"/>
        <xsl:with-param name="col" select="$expected.col"/>
        <xsl:with-param name="stop.col" select="$col - 1"/>
      </xsl:call-template>
    </xsl:if>

    <!-- Copy the item we're on -->
    <xsl:copy-of select="."/>

    <!-- If this is the last item on the row and there are missing items create them -->
    <xsl:if test="$is.last-in-row and $max.col > $col">
      <xsl:call-template name="fill.col">
        <xsl:with-param name="row" select="$row"/>
        <xsl:with-param name="col" select="$col + 1"/>
        <xsl:with-param name="stop.col" select="$max.col"/>
      </xsl:call-template>
    </xsl:if>

    <!-- Move on to the next item -->
    <xsl:choose>
      <xsl:when test="$is.last-in-row">
        <!-- If we're the last in row, increase our expected row and reset the expected.col -->
        <xsl:apply-templates select="following-sibling::item[1]">
          <xsl:with-param name="expected.row" select="$expected.row + 1"/>
          <xsl:with-param name="expected.col" select="1"/>
        </xsl:apply-templates>
      </xsl:when>
      <xsl:otherwise>
        <!-- Increment our expected col and keep expected row the same -->
        <xsl:apply-templates select="following-sibling::item[1]">
          <xsl:with-param name="expected.row" select="$expected.row"/>
          <xsl:with-param name="expected.col" select="$expected.col + 1"/>
        </xsl:apply-templates>
      </xsl:otherwise>
    </xsl:choose>

  </xsl:template>

  <!-- Recursively create item elements with the given $row for all the cols from $col to $stop.col inclusive -->
  <xsl:template name="fill.col">
    <xsl:param name="row"/>
    <xsl:param name="col"/>
    <xsl:param name="stop.col"/>
    <xsl:if test="$stop.col >= $col">
      <item>
        <row><xsl:value-of select="concat('Row',$row)"/></row>
        <col><xsl:value-of select="concat('Col',$col)"/></col>
        <value/>
      </item>
      <xsl:call-template name="fill.col">
        <xsl:with-param name="row" select="$row"/>
        <xsl:with-param name="col" select="$col + 1"/>
        <xsl:with-param name="stop.col" select="$stop.col"/>
      </xsl:call-template>
    </xsl:if>
  </xsl:template>

  <!-- Recursively create $max.col length rows of item elements from $row to $stop.row inclusive  -->
  <xsl:template name="fill.row">
    <xsl:param name="row"/>
    <xsl:param name="stop.row"/>
    <xsl:if test="$stop.row >= $row">
      <xsl:call-template name="fill.col">
        <xsl:with-param name="row" select="$row"/>
        <xsl:with-param name="col" select="1"/>
        <xsl:with-param name="stop.col" select="$max.col"/>
      </xsl:call-template>
      <xsl:call-template name="fill.row">
        <xsl:with-param name="row" select="$row + 1"/>
        <xsl:with-param name="stop.row" select="$stop.row"/>
      </xsl:call-template>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

Upvotes: 2

Sean B. Durkin
Sean B. Durkin

Reputation: 12729

My approach would be...

  1. Compute the maximum column width and stick it in a variable.
  2. Do the same for Rows.
  3. Use Piez method to iterate over rows and columns, testing for empty values and outputting appropriately.

XSLT 1.0 Solution (Piez style)

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:so="http://stackoverflow.com/questions/13575269"
  xmlns:exsl="http://exslt.org/common"
  exclude-result-prefixes="xsl so exsl">
<xsl:output indent="yes" omit-xml-declaration="yes" />
<xsl:strip-space elements="*" />    

<xsl:variable name="rank-and-file">
  <xsl:apply-templates select="/*" mode="counting" />
</xsl:variable>

<xsl:variable name="col-count">
  <xsl:for-each select="exsl:node-set($rank-and-file)/so:col">
    <xsl:sort select="." data-type="number" order="descending" />
    <xsl:if test="position() = 1">
      <xsl:value-of select="."/>
    </xsl:if>
  </xsl:for-each>  
</xsl:variable>

<xsl:variable name="row-count">
  <xsl:for-each select="exsl:node-set($rank-and-file)/so:row">
    <xsl:sort select="." data-type="number" order="descending" />
    <xsl:if test="position() = 1">
      <xsl:value-of select="."/>
    </xsl:if>
  </xsl:for-each>  
</xsl:variable>

<xsl:template match="*" mode="counting">
  <xsl:apply-templates mode="counting" />
</xsl:template>  

<xsl:template match="row" mode="counting">
  <so:row>
    <xsl:value-of select="substring(.,4)" />
  </so:row>  
</xsl:template>  

<xsl:template match="col" mode="counting">
  <so:col>
    <xsl:value-of select="substring(.,4)" />
  </so:col>  
</xsl:template>  

<xsl:template match="/*">
  <xsl:variable name="data" select="." />
  <xsl:copy> 
    <xsl:for-each select="(((/)//*)/node())[position() &lt;= $row-count]">
      <xsl:variable name="row" select="position()" />
      <xsl:for-each select="(((/)//*)/node())[position() &lt;= $col-count]">
        <xsl:variable name="col" select="position()" />
        <xsl:variable name="cell" select="$data/item[row=concat('Row',$row)]
                                                    [col=concat('Col',$col)]" />
        <xsl:copy-of select="$cell" />
        <xsl:if test="not( $cell)">
          <item>
            <row><xsl:value-of select="concat('Row',$row)" /></row>
            <col><xsl:value-of select="concat('Col',$row)" /></col>
            <value/>
          </item>
        </xsl:if>
      </xsl:for-each>  
    </xsl:for-each>  
  </xsl:copy>  
</xsl:template>

</xsl:stylesheet>

Caveat emptor

  1. Of course the usual cautions about Piez method applies. Let us know if Piez is not suitable.
  2. A side effect is that the rows and columns become sorted. This may or may not be a good thing depending on what you want.

Update

If you have a super-sparse input document and Piez limits become an issue, here is a safer (but slower) alternative.

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:so="http://stackoverflow.com/questions/13575269"
  xmlns:exsl="http://exslt.org/common"
  exclude-result-prefixes="xsl so exsl">
<xsl:output indent="yes" omit-xml-declaration="yes" />
<xsl:strip-space elements="*" />    

<xsl:variable name="rank-and-file">
  <xsl:apply-templates select="/*" mode="counting" />
</xsl:variable>

<xsl:variable name="col-count">
  <xsl:for-each select="exsl:node-set($rank-and-file)/so:col">
    <xsl:sort select="." data-type="number" order="descending" />
    <xsl:if test="position() = 1">
      <xsl:value-of select="."/>
    </xsl:if>
  </xsl:for-each>  
</xsl:variable>

<xsl:variable name="row-count">
  <xsl:for-each select="exsl:node-set($rank-and-file)/so:row">
    <xsl:sort select="." data-type="number" order="descending" />
    <xsl:if test="position() = 1">
      <xsl:value-of select="."/>
    </xsl:if>
  </xsl:for-each>  
</xsl:variable>

<xsl:template match="*" mode="counting">
  <xsl:apply-templates mode="counting" />
</xsl:template>  

<xsl:template match="row" mode="counting">
  <so:row>
    <xsl:value-of select="substring(.,4)" />
  </so:row>  
</xsl:template>  

<xsl:template match="col" mode="counting">
  <so:col>
    <xsl:value-of select="substring(.,4)" />
  </so:col>  
</xsl:template>  

<xsl:template name="make-counters">
  <xsl:param name="count" />
  <so:_/><so:_/><so:_/><so:_/><so:_/><so:_/><so:_/><so:_/>
  <xsl:if test="$count &gt; 8">
    <xsl:call-template name="make-counters">
      <xsl:with-param name="count" select="$count - 4" />
    </xsl:call-template>  
  </xsl:if>  
</xsl:template>

<xsl:variable name="counters-doc">
  <xsl:call-template name="make-counters">
    <xsl:with-param name="count" select="$col-count + $row-count" />
  </xsl:call-template>  
</xsl:variable>

<xsl:variable name="counters" select="exsl:node-set($counters-doc)/*" />

<xsl:template match="/*">
  <xsl:variable name="data" select="." />
  <xsl:copy> 
    <xsl:for-each select="$counters[position() &lt;= $row-count]">
      <xsl:variable name="row" select="position()" />
      <xsl:for-each select="$counters[position() &lt;= $col-count]">
        <xsl:variable name="col" select="position()" />
        <xsl:variable name="cell" select="$data/item[row=concat('Row',$row)]
                                                    [col=concat('Col',$col)]" />
        <xsl:copy-of select="$cell" />
        <xsl:if test="not( $cell)">
          <item>
            <row><xsl:value-of select="concat('Row',$row)" /></row>
            <col><xsl:value-of select="concat('Col',$row)" /></col>
            <value/>
          </item>
        </xsl:if>
      </xsl:for-each>  
    </xsl:for-each>  
  </xsl:copy>  
</xsl:template>

</xsl:stylesheet>

Upvotes: 1

Related Questions