Issue Getting Nested Elements in xml File using minidom

Question

I'm trying to parse an "xml" file in python for a project.

I want the code to parse through the xml and grab information for each Procedure. These information will be returned as a python dictionary.

Specifically, I will transverse down through each Procedure element and get information on its Data# name and types.

Currently, my code is as below.

The issue is Data2 is not of the right object type so I can't transverse into the Variable layer.
I don't understand why I can't keep using getElementsByTagName to go down through each layer.

In the full code I'll be doing it for each Data# and I should expect 'none' or empty nodes specified for a Procedure. The code should then be expected to handle that (not sure how to handle it when there is nothing either besides checking if Data2Element). Its fine it the suggested solution uses another methodology.

Hence the question is how should I handle empty nodes in a xml document in python.

Note: I have no control over the file format, I have 'standard' python 3.3 modules so that includes xml.dom and xml.etree, additionally I have Beautiful Soup (but no lxml). I cannot install 'lxml' or anything else that's not already installed. I'm happy to switch to one of the other installed modules if that's needed for my solution.

filename = 'TestProc.xml'
from xml.dom import minidom

xmldoc = minidom.parse(filename)

procedureList = xmldoc.getElementsByTagName('Procedure')

varName=[]
varType=[]
for procElement in procedureList:
    Data2 = procElement.getElementsByTagName('Data2')
    varElements = Data2.getElementsByTagName('Variable')
    for varElemTmp in varElements:
        varName.append(varElemTmp.getAttribute('name'))
        varType.append(varElemTmp.getAttribute('type'))

Where TestProc.xml is the following.




        

        

        


    
        
            Description1.
            
            
            
            
            
Junk1

        
        
            Description2.
            
            
                
                        Description3
                    
            
            
                
                    Description4
                
                
                    Description5
                
            
            
                
                
            
            
Junk2

Robᵩ · Accepted Answer

Data2 is a list of elements, not a single element. You could modify your code like so:

for procElement in procedureList:
    ListOfData2 = procElement.getElementsByTagName('Data2')
    for Data2 in ListOfData2:
        varElements = Data2.getElementsByTagName('Variable')
        for varElemTmp in varElements:
            varName.append(varElemTmp.getAttribute('name'))
            varType.append(varElemTmp.getAttribute('type'))

If you do switch to ElementTree, you can save yourself some looping by using XPath syntax:

filename = 'TestProc.xml'
import xml.etree.ElementTree as ET

xmldoc = ET.parse(filename)

variables = xmldoc.findall(".//Procedure/Data2/Variable")

varName=[e.get('name') for e in variables]
varType=[e.get('type') for e in variables]

print varName, varType

Issue Getting Nested Elements in xml File using minidom

Answers (1)

Related Questions