Reputation: 25
I want to change the structure from an XML to another standard structure given to me. I believe I can achieve that through the following steps:
I have this xml example
<section xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="id-c3ee53e4-e2ef-441b-8f3b-7320c4e32ef8" type="policy" xsi:noNamespaceSchemaLocation="urn:fontoxml:cpa.xsd:5.0">
<title id="id-f0497441-5ecb-47ee-b7c0-263832a9e402">
<anchor id="_Toc493170182"/>
<anchor id="__RefHeading___Toc3574_3674829928"/>
<anchor id="_Toc72503731"/>
<anchor id="_Toc69390724"/>
<anchor id="_Toc493496869"/>
Abbreviations of Terms
</title>
<table frame="all" id="id-6837f232-02e3-4e7a-ce8d-cb2df48256ac">
<tgroup cols="2" id="id-437c0d54-7257-4d34-a73d-351d533f0460">
<colspec colname="column-0" colnum="1" colsep="1" rowsep="1" colwidth="0.2*" id="id-c87e1040-c2d7-4b15-fb0c-86557d201235" />
<colspec colname="column-1" colnum="2" colsep="1" rowsep="1" colwidth="0.8*" id="id-5bebcf85-440b-416e-b2f9-72e47d5bb4f7" />
<thead id="id-ff67f8a7-5baf-4a42-ac31-09c0f99cceed">
<row id="id-542df999-7736-4cc2-e725-1b7b106e08d6">
<entry rowsep="1" colsep="1" colname="column-0" id="id-54a7d605-21ff-44db-c1f6-03111db180c7">
<para id="id-f43f7fb1-cd40-4b4a-88f2-02e55e786a5e">
<emphasis style="bold">Abbreviation
</emphasis>
</para>
</entry>
<entry rowsep="1" colsep="1" colname="column-1" id="id-aecec4c6-f85b-490e-9b72-99c6764b49cf">
<para id="id-4d89100a-4e4c-419a-d081-f776bcf9083e">
<emphasis style="bold">Definition
</emphasis>
</para>
</entry>
</row>
</thead>
<tbody id="id-824fc56b-431b-4ad3-e933-f0fc222e50d3">
<row id="id-620a8ff6-0189-41c7-e9af-dc9498ce703e">
<entry rowsep="1" colsep="1" colname="column-0" id="id-fb941cc0-287d-4760-a5a0-87419fa66d68">
<para id="id-127a8a37-9705-496b-87ee-303bcfd52a25">A/C</para>
</entry>
<entry rowsep="1" colsep="1" colname="column-1" id="id-317ad682-6e02-43c3-b724-5d50683c8f79">
<para id="id-c7c2fac5-f286-4802-b8d6-2e54fa2cad3c">AirCraft</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
</section>
And this is the code that I have so far
from lxml import etree
import numpy as np
#Parsing the xml file and creating lists
tree = etree.parse("InitialFile")
root = tree.getroot()
Lista = []
tags = []
#Get the unique tags values
for element in root.iter():
Lista.append(element.tag)
tags = np.unique(Lista)
#Show the unique tag[attributes] pairs
for tag in tags:
print(tag,root.xpath(f'//{tag}')[0].attrib.keys())
#Changes the tag name to the required's tag's name
for p in tree.findall(".//sect1"):
p.tag = ("section")
for p in tree.findall(".//informaltable"):
p.tag = ("table")
#Modify the tag's attributes to its desired form
for cy in root.xpath('//section'):
cy.attrib['xmlns:xsi']='http://www.w3.org/2001/XMLSchema-instance' #it doesnt accept : as part of the attribute's name and i don't know why
cy.attrib['id']=random() #this doesn't work yet
cy.attrib['type']='policy'
cy.attrib['xsi:noNamespaceSchemaLocation']='urn:fontoxml:cpa.xsd:1.0'#it doesnt accept :as part of the attribute's name and i don't know why
#Modify the attributes values
for t in root.xpath('//title'):
t.attrib['id']='random()
for p in root.xpath('//section'):
p.attrib['id']=random()
p.attrib['type']='policy'
for p in root.xpath('//table'):
p.attrib['id']=random()
for ct in root.xpath('//colspec'):
ct.attrib.pop("rowsep", None)
#Print the new xml to make sure it worked:
print(etree.tostring(root).decode())
tree.write("Final file.xml")
If you have any other ideas please feel free to share.
Upvotes: 0
Views: 120
Reputation: 167446
I agree that this is a task for XSLT (which can be used by lxml), here is an example stylesheet that tries to implement some of your requirements in a modular way by delegating each change to a template of its own:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
version="1.0">
<xsl:output method="xml"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="sect1">
<section>
<xsl:apply-templates select="@* | node()"/>
</section>
</xsl:template>
<xsl:template match="informaltable">
<table>
<xsl:apply-templates select="@* | node()"/>
</table>
</xsl:template>
<xsl:template match="@id">
<xsl:attribute name="{name()}">
<xsl:value-of select="generate-id()"/>
</xsl:attribute>
</xsl:template>
<xsl:template match="@xsi:noNamespaceSchemaLocation">
<xsl:attribute name="{name()}" namespace="{namespace-uri()}">urn:fontoxml:cpa.xsd:1.0</xsl:attribute>
</xsl:template>
<xsl:template match="colspec/@rowsep"/>
</xsl:stylesheet>
https://xsltfiddle.liberty-development.net/bET2rXs
I hope with that as a starting point and any XSLT tutorial or introduction you can work it out.
Upvotes: 1