Techno04335
Techno04335

Reputation: 1445

Python: How to Loop thoough every XML node and print values based on list

For Validation Purposes: How can I search thorough a whole XML node by node(even children) as below:

XML File:

<Summary>
<Hardware_Info>
    <HardwareType>FlashDrive</HardwareType>
    <ManufacturerDetail>
            <ManufacturerCompany>Company1</ManufacturerCompany>
            <ManufacturerDate>2017-07-20T12:26:04-04:00</ManufacturerDate>
            <ModelCode>4BR6282</ModelCode>
    </ManufacturerDetail>
    <ActivationDate>2017-07-20T12:26:04-04:00</ActivationDate>
</Hardware_Info>
<DeviceConnectionInfo>
    <Device>
        <Index>0</Index>
        <Name>Laptop1</Name>
        <Status>Installed</Status>
    </Device>
    <Device>
        <Index>1</Index>
        <Name>Laptop2</Name>
        <Status>Installed</Status>
    </Device>
</DeviceConnectionInfo>
</Summary>

and search for the values based on matching columns of a particular Table. For the sake of an example the table is as is:

TABLE:

HardwareType    ManufacturerCompany    ManufacturerDate             ActivationDate              Device.Index        Name
FlashDrive      Company1               2017-07-20T12:26:04-04:00    2017-07-20T12:26:04-04:00   0                   Laptop1
FlashDrive      Company2               2017-07-20T12:26:04-04:00    2017-07-20T12:26:04-04:00   1                   Laptop2

In this case I would have a column list of:

HardwareType, ManufacturerCompany, ManufacturerDate, ActivationDate, Device.Index, Name

For my final result I would like to print the values of the table column names along with the values of the table names found on the xml. For example similar like the original table (assuming validation is fine):

OUTPUT RESULT :

 HardwareType   ManufacturerCompany    ManufacturerDate             ActivationDate              Device.Index        Name
    FlashDrive      Company1               2017-07-20T12:26:04-04:00    2017-07-20T12:26:04-04:00   0                   Laptop1
    FlashDrive      Company2               2017-07-20T12:26:04-04:00    2017-07-20T12:26:04-04:00   1                   Laptop2

CURRENT IMPLEMENTATION:

For example I am able to get the list of the column names of the table, however but so far the best to my nkowledge to implement this is:

import xml.etree.ElementTree as ET
import csv

tree = ET.parse("/test.xml")
root = tree.getroot()

f = open('/test.csv', 'w')

csvwriter = csv.writer(f)

count = 0

head = ['ManufacturerCompany','ManufacturerDate',...]

csvwriter.writerow(head)

for time in root.findall('Summary'):
     row = []
     job_name = time.find('ManufacturerDetail').find('ManufacturerCompany').text
     row.append(job_name)
     job_name = time.find('ManufacturerDetail').find('ManufacturerDate').text
     row.append(job_name)
     csvwriter.writerow(row)
f.close()

However, this implementation does not have the loop each functionality I want for the output. Any guidance or suggestions for implementation would be great.

Thanks

Upvotes: 0

Views: 984

Answers (1)

Parfait
Parfait

Reputation: 107567

Consider XSLT, the special purpose language designed to transform XML files into other XML, HTML (which mostly known for), but also text files (TXT/CSV) with its method="text". Specifically, walk down to the Device node level and bring over the ancestor items.

Python's third-party lxml module can run XSLT 1.0 scripts. However, XSLT is portable and any XSLT processor can run such code including Unix's (Linux/Mac) available xsltproc.

XSLT (save as .xsl file, a special .xml file; &#xa; is the line break entity)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" method="text"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="delimiter">,</xsl:param>

  <xsl:template match="/Summary">
    <xsl:text>HardwareType,ManufacturerCompany,ManufacturerDate,ActivationDate,Device.Index,Name&#xa;</xsl:text>    
    <xsl:apply-templates select="DeviceConnectionInfo"/>    
  </xsl:template>

  <xsl:template match="DeviceConnectionInfo">
    <xsl:apply-templates select="Device"/>    
  </xsl:template>

  <xsl:template match="Device">
    <xsl:value-of select="concat(ancestor::Summary/Hardware_Info/HardwareType, $delimiter,
                                 ancestor::Summary/Hardware_Info/ManufacturerDetail/ManufacturerCompany, $delimiter,
                                 ancestor::Summary/Hardware_Info/ManufacturerDetail/ManufacturerDate, $delimiter,
                                 ancestor::Summary/Hardware_Info/ActivationDate, $delimiter,
                                 Index, $delimiter,
                                 Name)"/><xsl:text>&#xa;</xsl:text>
  </xsl:template>

</xsl:stylesheet>

Python (with lxml)

import lxml.etree as et

# LOAD XML AND XSL
doc = et.parse('input.xml')
xsl = et.parse('xslt_script.xsl')

# TRANSFORM INPUT TO STRING
transform = et.XSLT(xsl)    
result = str(transform(doc))

# SAVE TO FILE
with open('output.csv', 'w') as f:
    f.write(result)

Python (one-line command call to xsltproc)

from subprocess import Popen

proc = Popen(['xsltproc -o output.csv xslt_script.xsl input.xml'], 
             shell=True, cwd='/path/to/working/directory')

Output

# HardwareType  ManufacturerCompany ManufacturerDate    ActivationDate  Device.Index    Name
# FlashDrive    Company1    2017-07-20T12:26:04-04:00   2017-07-20T12:26:04-04:00   0   Laptop1
# FlashDrive    Company1    2017-07-20T12:26:04-04:00   2017-07-20T12:26:04-04:00   1   Laptop2

Upvotes: 1

Related Questions