GWorking
GWorking

Reputation: 4341

XSLT sorting groups of nodes previously selected with key

I have an XML file with a group of protein nodes, each one with an accession number. And a group of peptides each one of them with an accession, sequence, RT and score parameters.

And I have an XSLT file that call, for each protein, all the peptide nodes that have the same accession number of the protein.

Then, in function of the value of a variable, here <xsl:param name="analysis" select="0"/>, the XSLT file calls all peptides that share the same accession number, or all peptides that share the same accession number but discarding all of those that have coincident sequence values.

Here is the code that does what I have said (changing the value of the variable from 0 to 1, one can see the 2 situations that I have described)

link

I also paste the code at the end of the post

Now, what I need to do, is to sort and select the peptide nodes that have the maximum score.

So, in the case where the XSLT file calls all peptides that share the same accession number, I need them to be sorted in function of its score values.

And, in the case where the code calls all peptides that share the same accession number, but selecting only one peptide of those that also share the same sequences, I need that this peptide is the one with the maximum score, not just the first that comes in the XML file.

I have tried to use the "sort" function in this code link, but if you take a look into that, you will see that the XML output orders all the peptides, losing the previous pre-grouping done with the key statement.

XML code

<data>
    <proteins>
        <protein>
            <accession>111</accession>
        </protein>
    </proteins>
    <peptides>
        <peptide>
            <accession>111</accession>
            <sequence>AAA</sequence>
            <RT>13</RT>
            <score>4000</score>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>AAA</sequence>
            <RT>14</RT>
            <score>6000</score>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>AAA</sequence>
            <RT>15</RT>
            <score>5000</score>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>BBB</sequence>
            <RT>23</RT>
            <score>5000</score>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>BBB</sequence>
            <RT>24</RT>
            <score>1000</score>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>BBB</sequence>
            <RT>25</RT>
            <score>8000</score>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>BBB</sequence>
            <RT>26</RT>
            <score>5000</score>
        </peptide>
    </peptides>
</data>

XSLT code

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
    <xsl:param name="analysis" select="0"/>
    <xsl:key name="byAcc"    match="/data/peptides/peptide" use="accession" />
    <xsl:key name="byAccSeq" match="/data/peptides/peptide" use="concat(accession, '|', sequence)"/>
    <xsl:template match="/">
        <root>
            <name>
                <xsl:value-of select="$analysis"/>
            </name>
            <xsl:apply-templates select="/data/proteins/protein" />
        </root>
    </xsl:template>
    <xsl:template match="/data/proteins/protein">
        <xsl:choose>
            <xsl:when test="$analysis=1">
                <xsl:apply-templates select="key('byAcc',accession)">
                </xsl:apply-templates>
            </xsl:when>
            <xsl:otherwise>
                <xsl:apply-templates select="key('byAcc',accession)[
                generate-id()
                =
                generate-id(key('byAccSeq', concat(accession, '|', sequence)))]">
            </xsl:apply-templates>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>
<xsl:template match="/data/peptides/peptide">
    <xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>

And the XML output that I need, for simplicity reasons when only one peptide is selected <xsl:param name="analysis" select="0"/>

<root>
    <name>0</name>
    <peptide>
        <accession>111</accession>
        <sequence>AAA</sequence>
        <RT>14</RT>
        <score>6000</score>
    </peptide>
    <peptide>
        <accession>111</accession>
        <sequence>BBB</sequence>
        <RT>25</RT>
        <score>8000</score>
    </peptide>
</root>

This is, from the 2 peptide nodes that share accession and sequence values, the ones with the maximum score value

Thanks

---------------------------------------------------------------------------

EDIT: Trying to make the question clearer

The most simplified code I can think of would be:

I have this XML code

<data>
    <peptides>
        <peptide>
            <accession>111</accession>
            <sequence>AAA</sequence>
            <score>4000</score>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>AAA</sequence>
            <score>6000</score>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>AAA</sequence>
            <score>5000</score>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>BBB</sequence>
            <score>5000</score>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>BBB</sequence>
            <score>1000</score>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>BBB</sequence>
            <score>8000</score>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>BBB</sequence>
            <score>5000</score>
        </peptide>
        <peptide>
            <accession>222</accession>
            <sequence>CCC</sequence>
            <score>5000</score>
        </peptide>
        <peptide>
            <accession>222</accession>
            <sequence>CCC</sequence>
            <score>9000</score>
        </peptide>
        <peptide>
            <accession>222</accession>
            <sequence>CCC</sequence>
            <score>2000</score>
        </peptide>
  </peptides>
</data>

without assuming that the way it is sorted is consistent, so that this XML can have the nodes in any position.

Then, what I want is to group the nodes first by accession, then within those nodes, to group them by sequence, and then within those nodes, to sort them by score.

So the output XML would be this

<data>
    <peptides>
        <peptide>
            <accession>111</accession>
            <sequence>AAA</sequence>
            <score>6000</score>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>AAA</sequence>
            <score>5000</score>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>AAA</sequence>
            <score>4000</score>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>BBB</sequence>
            <score>8000</score>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>BBB</sequence>
            <score>5000</score>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>BBB</sequence>
            <score>5000</score>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>BBB</sequence>
            <score>1000</score>
        </peptide>
        <peptide>
            <accession>222</accession>
            <sequence>CCC</sequence>
            <score>9000</score>
        </peptide>
        <peptide>
            <accession>222</accession>
            <sequence>CCC</sequence>
            <score>5000</score>
        </peptide>
        <peptide>
            <accession>222</accession>
            <sequence>CCC</sequence>
            <score>2000</score>
        </peptide>
  </peptides>
</data>

SOLUTION to this last code, from the answer kindly provided:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
    <xsl:key name="byAcc"    match="/data/peptides/peptide" use="accession" />
    <xsl:key name="byAccSeq" match="/data/peptides/peptide" use="concat(accession, '|', sequence)"/>
    <xsl:template match="/">
        <root>
            <xsl:apply-templates select="/data/proteins/protein">
            </xsl:apply-templates>
        </root>
    </xsl:template>
    <xsl:template match="/data/proteins/protein">
        <xsl:apply-templates select="key('byAcc',accession)">
            <xsl:sort select="sequence" data-type="text"/>
            <xsl:sort select="score" data-type="number"/>
        </xsl:apply-templates>
    </xsl:template>
    <xsl:template match="/data/peptides/peptide">
        <xsl:copy-of select="."/>
    </xsl:template>
</xsl:stylesheet>

check here

Upvotes: 0

Views: 756

Answers (1)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243549

It isn't clear exactly what processing is wanted, but here I provide two different combinations of grouping and sorting (I also assume that XSLT 1.0 is required):

Let's have the following very simple XML document:

<nums>
 <num>5</num>
 <num>1</num>
 <num>2</num>
 <num>2</num>
 <num>9</num>
 <num>1</num>
 <num>5</num>
 <num>2</num>
 <num>9</num>
 <num>8</num>
</nums>

If we want to eliminate the duplicates, we may start with this transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:key name="kNumByVal" match="num" use="."/>

 <xsl:template match="/*">
  <nums>
    <xsl:copy-of select=
    "num
      [generate-id(key('kNumByVal', .)[1])
      =
       generate-id()
      ]"/>
  </nums>
 </xsl:template>
</xsl:stylesheet>

the result is:

<nums>
   <num>5</num>
   <num>1</num>
   <num>2</num>
   <num>9</num>
   <num>8</num>
</nums>

Now, if we want in the result the num elements to be sorted by value, there are two different solutions:

  1. Sort the original XML document and then perform the grouping:

...

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ext="http://exslt.org/common" exclude-result-prefixes="ext">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:key name="kNumByVal" match="num" use="."/>

 <xsl:template match="/*">

  <xsl:variable name="vrtfSorted">
   <xsl:for-each select="num">
    <xsl:sort data-type="number"/>
    <xsl:copy-of select="."/>
   </xsl:for-each>
  </xsl:variable>

  <xsl:variable name="vSorted" select=
       "ext:node-set($vrtfSorted)/*"/>

  <nums>
   <xsl:for-each select=
    "$vSorted
      [generate-id(key('kNumByVal', .)[1])
      =
       generate-id()
      ]
     ">
    <xsl:sort data-type="number"/>
    <xsl:copy-of select="."/>
   </xsl:for-each>
  </nums>
 </xsl:template>
</xsl:stylesheet>

the correct result is produced:

<nums>
   <num>1</num>
   <num>2</num>
   <num>5</num>
   <num>8</num>
   <num>9</num>
</nums>

.2. Perform the grouping and then sort the result:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:key name="kNumByVal" match="num" use="."/>

 <xsl:template match="/*">

    <xsl:variable name="vDistinct" select=
    "num
      [generate-id(key('kNumByVal', .)[1])
      =
       generate-id()
      ]"/>

  <nums>
   <xsl:for-each select="$vDistinct">
    <xsl:sort data-type="number"/>
    <xsl:copy-of select="."/>
   </xsl:for-each>
  </nums>
 </xsl:template>
</xsl:stylesheet>

again the correct result is produced:

<nums>
   <num>1</num>
   <num>2</num>
   <num>5</num>
   <num>8</num>
   <num>9</num>
</nums>

Update: After clarifications in a comment by the OP, here is a new example:

<nums>
  <num seq="1">01</num>
  <num seq="2">01</num>
  <num seq="1">01</num>
  <num seq="3">01</num>
  <num seq="2">01</num>
  <num seq="4">01</num>
  <num seq="1">02</num>
  <num seq="2">3</num>
  <num seq="3">04</num>
  <num seq="3">01</num>
  <num seq="4">02</num>
  <num seq="1">01</num>
</nums>

The requirement is to group by @seq and by the string value of the element and to sort by both @seq and the string value:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:key name="kBySeqVal" match="num"
  use="concat(@seq, '+', .)"/>

 <xsl:key name="kByValSeq" match="num"
  use="concat(., '+', @seq.)"/>

 <xsl:template match="/">
  <xsl:apply-templates select=
   "/*/*
     [generate-id()
     =
      generate-id(key('kBySeqVal',
                      concat(@seq, '+', .)
                      )
                       [1]
                  )
     ]
   ">
    <xsl:sort select="@seq" data-type="number"/>
    <xsl:sort select="." data-type="number"/>
   </xsl:apply-templates>
 </xsl:template>

 <xsl:template match="num">
  <xsl:copy-of select="."/>
 </xsl:template>
</xsl:stylesheet>

and the result is grouped and sorted exactly as wanted:

<num seq="1">01</num>
<num seq="1">02</num>
<num seq="2">01</num>
<num seq="2">3</num>
<num seq="3">01</num>
<num seq="3">04</num>
<num seq="4">01</num>
<num seq="4">02</num>

Upvotes: 1

Related Questions