ellabells
ellabells

Reputation: 49

parsing nested xml in python

I have this XML file:

<?xml version="1.0" ?><XMLSchemaPalletLoadTechData xmlns="http://tempuri.org/XMLSchemaPalletLoadTechData.xsd">
  <TechDataParams>
    <RunNumber>sample</RunNumber>
    <Holder>sample</Holder>
    <ProcessToolName>sample</ProcessToolName>
    <RecipeName>sample</RecipeName>
    <PalletName>sample</PalletName>
    <PalletPosition>sample</PalletPosition>
    <IsControl>sample</IsControl>
    <LoadPosition>sample</LoadPosition>
    <HolderJob>sample</HolderJob>
    <IsSPC>sample</IsSPC>
    <MeasurementType>sample</MeasurementType>
  </TechDataParams>
  <TechDataParams>
    <RunNumber>sample</RunNumber>
    <Holder>sample</Holder>
    <ProcessToolName>sample</ProcessToolName>
    <RecipeName>sample</RecipeName>
    <PalletName>sample</PalletName>
    <PalletPosition>sample</PalletPosition>
    <IsControl>sample</IsControl>
    <LoadPosition>sample</LoadPosition>
    <HolderJob>sample</HolderJob>
    <IsSPC>sample</IsSPC>
    <MeasurementType>XRF</MeasurementType>
  </TechDataParams>
</XMLSchemaPalletLoadTechData>

And this is my code for parsing the xml:

for data in xml.getElementsByTagName('TechDataParams'):
    #parse xml
    runnum=data.getElementsByTagName('RunNumber')[0].firstChild.nodeValue
    hold=data.getElementsByTagName('Holder')[0].firstChild.nodeValue
    processtn=data.getElementsByTagName('ProcessToolName'[0].firstChild.nodeValue)
    recipedata=data.getElementsByTagName('RecipeName'[0].firstChild.nodeValue)
    palletna=data.getElementsByTagName('PalletName')[0].firstChild.nodeValue
    palletposi=data.getElementsByTagName('PalletPosition')[0].firstChild.nodeValue
    control = data.getElementsByTagName('IsControl')[0].firstChild.nodeValue
    loadpos=data.getElementsByTagName('LoadPosition')[0].firstChild.nodeValue
    holderjob=data.getElementsByTagName('HolderJob')[0].firstChild.nodeValue
    spc = data.getElementsByTagName('IsSPC')[0].firstChild.nodeValue
    mestype = data.getElementsByTagName('MeasurementType')[0].firstChild.nodeValue

but when i print each node, i am only getting one set of 'TechDataParams', but I want to be able to get all 'TechDataParams' from the XML.

Let me know if my question is a bit unclear.

Upvotes: 1

Views: 644

Answers (3)

Vivek Sable
Vivek Sable

Reputation: 10223

Also by lxml.etree module.

  1. Input contain namespace i.e. http://tempuri.org/XMLSchemaPalletLoadTechData.xsd
  2. Use xpath method to find target TechDataParams tags.
  3. Get children of TechDataParams tag and create dictionary which key is tag name and value is text of tag.
  4. Append to list varaible which is TechDataParams.

code:

from lxml import etree
root = etree.fromstring(content)
TechDataParams_info = []
for  i in root.xpath("//a:XMLSchemaPalletLoadTechData/a:TechDataParams", namespaces={"a": 'http://tempuri.org/XMLSchemaPalletLoadTechData.xsd'}):
    temp = dict()
    for j in i.getchildren():
        temp[j.tag.split("}", 1)[-1]] = j.text
    TechDataParams_info.append(temp)

print TechDataParams_info

output:

[{'PalletPosition': 'sample', 'HolderJob': 'sample', 'RunNumber': 'sample', 'ProcessToolName': 'sample', 'RecipeName': 'sample', 'IsControl': 'sample', 'PalletName': 'sample', 'LoadPosition': 'sample', 'MeasurementType': 'sample', 'Holder': 'sample', 'IsSPC': 'sample'}, {'PalletPosition': 'sample', 'HolderJob': 'sample', 'RunNumber': 'sample', 'ProcessToolName': 'sample', 'RecipeName': 'sample', 'IsControl': 'sample', 'PalletName': 'sample', 'LoadPosition': 'sample', 'MeasurementType': 'XRF', 'Holder': 'sample', 'IsSPC': 'sample'}]

Upvotes: 0

alecxe
alecxe

Reputation: 474171

Please don't dive into parsing XML with minidom, unless you want your hair to be pulled out by yourself.

I would use xmltodict module here. One line and you have a list of dicts with all the data you need:

import xmltodict

data = """your xml here"""

data = xmltodict.parse(data)['XMLSchemaPalletLoadTechData']['TechDataParams']
for params in data:
    print dict(params)

Prints:

{u'PalletPosition': u'sample', u'HolderJob': u'sample', u'RunNumber': u'sample', u'ProcessToolName': u'sample', u'RecipeName': u'sample', u'IsControl': u'sample', u'PalletName': u'sample', u'LoadPosition': u'sample', u'MeasurementType': u'sample', u'Holder': u'sample', u'IsSPC': u'sample'}
{u'PalletPosition': u'sample', u'HolderJob': u'sample', u'RunNumber': u'sample', u'ProcessToolName': u'sample', u'RecipeName': u'sample', u'IsControl': u'sample', u'PalletName': u'sample', u'LoadPosition': u'sample', u'MeasurementType': u'XRF', u'Holder': u'sample', u'IsSPC': u'sample'}

Upvotes: 1

Stephen Lin
Stephen Lin

Reputation: 4912

Here is an example for you. Replace file_path with your own.

I replace value of RunNumber with 001 and 002.

# -*- coding: utf-8 -*-
#!/usr/bin/python

from xml.dom import minidom

file_path = 'C:\\temp\\test.xml'

doc = minidom.parse(file_path)
TechDataParams = doc.getElementsByTagName('TechDataParams')
for t in TechDataParams:
    num = t.getElementsByTagName('RunNumber')[0]
    print 'num is ', num.firstChild.data

OUTPUT:

num is  001
num is  002

Upvotes: 0

Related Questions