Reputation: 1445
For Validation Purposes: How can I search thorough a whole XML node by node(even children) as below:
XML File:
<Summary>
<Hardware_Info>
<HardwareType>FlashDrive</HardwareType>
<ManufacturerDetail>
<ManufacturerCompany>Company1</ManufacturerCompany>
<ManufacturerDate>2017-07-20T12:26:04-04:00</ManufacturerDate>
<ModelCode>4BR6282</ModelCode>
</ManufacturerDetail>
<ActivationDate>2017-07-20T12:26:04-04:00</ActivationDate>
</Hardware_Info>
<DeviceConnectionInfo>
<Device>
<Index>0</Index>
<Name>Laptop1</Name>
<Status>Installed</Status>
</Device>
<Device>
<Index>1</Index>
<Name>Laptop2</Name>
<Status>Installed</Status>
</Device>
</DeviceConnectionInfo>
</Summary>
and search for the values based on matching columns of a particular Table. For the sake of an example the table is as is:
TABLE:
HardwareType ManufacturerCompany ManufacturerDate ActivationDate Device.Index Name
FlashDrive Company1 2017-07-20T12:26:04-04:00 2017-07-20T12:26:04-04:00 0 Laptop1
FlashDrive Company2 2017-07-20T12:26:04-04:00 2017-07-20T12:26:04-04:00 1 Laptop2
In this case I would have a column list of:
HardwareType, ManufacturerCompany, ManufacturerDate, ActivationDate, Device.Index, Name
For my final result I would like to print the values of the table column names along with the values of the table names found on the xml. For example similar like the original table (assuming validation is fine):
OUTPUT RESULT :
HardwareType ManufacturerCompany ManufacturerDate ActivationDate Device.Index Name
FlashDrive Company1 2017-07-20T12:26:04-04:00 2017-07-20T12:26:04-04:00 0 Laptop1
FlashDrive Company2 2017-07-20T12:26:04-04:00 2017-07-20T12:26:04-04:00 1 Laptop2
CURRENT IMPLEMENTATION:
For example I am able to get the list of the column names of the table, however but so far the best to my nkowledge to implement this is:
import xml.etree.ElementTree as ET
import csv
tree = ET.parse("/test.xml")
root = tree.getroot()
f = open('/test.csv', 'w')
csvwriter = csv.writer(f)
count = 0
head = ['ManufacturerCompany','ManufacturerDate',...]
csvwriter.writerow(head)
for time in root.findall('Summary'):
row = []
job_name = time.find('ManufacturerDetail').find('ManufacturerCompany').text
row.append(job_name)
job_name = time.find('ManufacturerDetail').find('ManufacturerDate').text
row.append(job_name)
csvwriter.writerow(row)
f.close()
However, this implementation does not have the loop each functionality I want for the output. Any guidance or suggestions for implementation would be great.
Thanks
Upvotes: 0
Views: 984
Reputation: 107567
Consider XSLT, the special purpose language designed to transform XML files into other XML, HTML (which mostly known for), but also text files (TXT/CSV) with its method="text"
. Specifically, walk down to the Device node level and bring over the ancestor items.
Python's third-party lxml module can run XSLT 1.0 scripts. However, XSLT is portable and any XSLT processor can run such code including Unix's (Linux/Mac) available xsltproc.
XSLT (save as .xsl file, a special .xml file; 

is the line break entity)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" method="text"/>
<xsl:strip-space elements="*"/>
<xsl:param name="delimiter">,</xsl:param>
<xsl:template match="/Summary">
<xsl:text>HardwareType,ManufacturerCompany,ManufacturerDate,ActivationDate,Device.Index,Name
</xsl:text>
<xsl:apply-templates select="DeviceConnectionInfo"/>
</xsl:template>
<xsl:template match="DeviceConnectionInfo">
<xsl:apply-templates select="Device"/>
</xsl:template>
<xsl:template match="Device">
<xsl:value-of select="concat(ancestor::Summary/Hardware_Info/HardwareType, $delimiter,
ancestor::Summary/Hardware_Info/ManufacturerDetail/ManufacturerCompany, $delimiter,
ancestor::Summary/Hardware_Info/ManufacturerDetail/ManufacturerDate, $delimiter,
ancestor::Summary/Hardware_Info/ActivationDate, $delimiter,
Index, $delimiter,
Name)"/><xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
Python (with lxml)
import lxml.etree as et
# LOAD XML AND XSL
doc = et.parse('input.xml')
xsl = et.parse('xslt_script.xsl')
# TRANSFORM INPUT TO STRING
transform = et.XSLT(xsl)
result = str(transform(doc))
# SAVE TO FILE
with open('output.csv', 'w') as f:
f.write(result)
Python (one-line command call to xsltproc)
from subprocess import Popen
proc = Popen(['xsltproc -o output.csv xslt_script.xsl input.xml'],
shell=True, cwd='/path/to/working/directory')
Output
# HardwareType ManufacturerCompany ManufacturerDate ActivationDate Device.Index Name
# FlashDrive Company1 2017-07-20T12:26:04-04:00 2017-07-20T12:26:04-04:00 0 Laptop1
# FlashDrive Company1 2017-07-20T12:26:04-04:00 2017-07-20T12:26:04-04:00 1 Laptop2
Upvotes: 1