Chrisnizz
Chrisnizz

Reputation: 43

Sorting XML file with XSLT or Python based on attribute of child element

I have a number of XML-files with a structure like this:

<titles>
  <title mode="example" name="name_example">
    <titleselect>
      <attribute_a>attrib_a</attribute_a>
      <attribute_b>attrib_b</attribute_b>
      <attribute_c>attrib_c</attribute_c>
      <sort_attribute>New York</sort_attribute>
    </titleselect>
  </title>
  <title mode="another_example" name="another_name">
    <titleselect>
      <attribute_a>attrib_a</attribute_a>
      <attribute_b>attrib_b</attribute_b>
      <attribute_c>attrib_c</attribute_c>
      <sort_attribute>Boston</sort_attribute>
    </titleselect>
  </title>
  <title mode="final_example" name="final_name">
    <titleselect>
      <attribute_a>attrib_a</attribute_a>
      <attribute_b>attrib_b</attribute_b>
      <attribute_c>attrib_c</attribute_c>
      <sort_attribute>Chicago</sort_attribute>
    </titleselect>
  </title>
</titles>

I am trying to sort the "titles" alphabetically by the "sort_attribute". My desired output is like this:

<titles>
      <title mode="another_example" name="another_name">
        <titleselect>
          <attribute_a>attrib_a</attribute_a>
          <attribute_b>attrib_b</attribute_b>
          <attribute_c>attrib_c</attribute_c>
          <sort_attribute>Boston</sort_attribute>
        </titleselect>
      </title>
      <title mode="final_example" name="final_name">
        <titleselect>
          <attribute_a>attrib_a</attribute_a>
          <attribute_b>attrib_b</attribute_b>
          <attribute_c>attrib_c</attribute_c>
          <sort_attribute>Chicago</sort_attribute>
        </titleselect>
      </title>
      <title mode="example" name="name_example">
        <titleselect>
          <attribute_a>attrib_a</attribute_a>
          <attribute_b>attrib_b</attribute_b>
          <attribute_c>attrib_c</attribute_c>
          <sort_attribute>New York</sort_attribute>
        </titleselect>
      </title>
    </titles>

Is there anyway to achieve this, preferably using XSLT or Python? I am completely new to the world of XSLT, but I have tried applying a number of solutions from other relevant questions e.g. XSLT sort parent element based on child element attribute to no avail.

Upvotes: 1

Views: 171

Answers (2)

DeepSpace
DeepSpace

Reputation: 81684

If you are still interested in a Python solution, it can be achieved by using ElementTree.

How it works:

  1. Getting all the title nodes
  2. Removing each one from the root node
  3. Sorting the title nodes in memory based on the sort_attribute tag
  4. Adding each title node back to the root element in the correct order


import xml.etree.ElementTree as ET


def get_sort_attribute_tag_value(node):
    return node.find('titleselect').find('sort_attribute').text

with open('test.xml') as f:
    xml_node = ET.fromstring(f.read())

title_nodes = xml_node.findall('title')

for title_node in title_nodes:
    xml_node.remove(title_node)

title_nodes.sort(key=get_sort_attribute_tag_value)

for title_node in title_nodes:
    xml_node.append(title_node)

print(ET.tostring(xml_node).decode())

# in order to save as a new file
with open('new_file.xml', 'w') as f:
    f.write(ET.tostring(xml_node).decode())

Outputs:

<titles>
    <title mode="another_example" name="another_name">
        <titleselect>
            <attribute_a>attrib_a</attribute_a>
            <attribute_b>attrib_b</attribute_b>
            <attribute_c>attrib_c</attribute_c>
            <sort_attribute>Boston</sort_attribute>
        </titleselect>
    </title>
    <title mode="final_example" name="final_name">
        <titleselect>
            <attribute_a>attrib_a</attribute_a>
            <attribute_b>attrib_b</attribute_b>
            <attribute_c>attrib_c</attribute_c>
            <sort_attribute>Chicago</sort_attribute>
        </titleselect>
    </title>
    <title mode="example" name="name_example">
        <titleselect>
            <attribute_a>attrib_a</attribute_a>
            <attribute_b>attrib_b</attribute_b>
            <attribute_c>attrib_c</attribute_c>
            <sort_attribute>New York</sort_attribute>
        </titleselect>
    </title>
</titles>

Upvotes: 1

StuartLC
StuartLC

Reputation: 107387

As an XSLT alternative, as per Tomalek's comment, this is fairly straightforward using a template capturing the parent titles and then sorting by the required sort_attribute (actually, an element), and copying the inner title content:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

  <!-- identity transform -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="titles">
    <xsl:copy>
      <xsl:apply-templates select="title">
        <xsl:sort select="titleselect/sort_attribute" data-type="text" order="ascending"/>
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Upvotes: 0

Related Questions