Mawg
Mawg

Reputation: 40140

Wildcard search at any nested depth using xml.etree.ElementTree

I have a group of XML files which contain entries like

   <group name="XXX common string">

      <value val="12" description="a dozen">
         <text>one less than a baker's dozen</text>
      </value>

      <value val="13" description="a baker's dozen">
         <text>One more than a dozen</text>
      </value>

   </group>

   <group name="YYY common string">

      <value val="42" description="the answer">
         <text>What do you get if you multiple 6 by 9?</text>
      </value>

   </group>

Is there any simple way, using import xml.etree.ElementTree as ET and

    parser = ET.XMLParser()
    parser.parser.UseForeignDTD(True)

    if (args.info) or (args.diagnostics):
        print('Parsing input file : ' + inputFileName)

    tree = ET.parse(inputFileName, parser=parser)
    root = tree.getroot()

to search for only <group> elements who#s name contains "common string" for a particular value val ?

Important: these groups are nested at different depths in different files.

Upvotes: 0

Views: 2304

Answers (2)

larsks
larsks

Reputation: 311238

This was a little difficult, because your own code won't work with the example data you posted in your question (e.g., nothing there contains the string error, and there are no id attributes, and your code doesn't appear to search for "a particular value val, which seemed to be one of your requirements). But here are a few ideas...

For finding all group elements that contain common string in the name attribute, you could do something like this:

>>> matching_groups = []
>>> for group in tree.xpath('//group[contains(@name, "common string")]'):
...   matching_groups.append[group]
...

Which given your sample data would result in:

>>> print '\n'.join([etree.tostring(x) for x in matching_groups])
<group name="XXX common string">

      <value val="12" description="a dozen">
         <text>one less than a baker's dozen</text>
      </value>

      <value val="13" description="a baker's dozen">
         <text>One more than a dozen</text>
      </value>

   </group>


<group name="YYY common string">

      <value val="42" description="the answer">
         <text>What do you get if you multiple 6 by 9?</text>
      </value>

   </group>

If you wanted to limit the results to only group elements that contain value element with attribute val == 42, you could try:

>>> matching_groups = []
>>> for group in tree.xpath('//group[contains(@name, "common string")][value/@val = "42"]'):
...     matching_groups.append(group)
... 

Which would yield:

>>> print '\n'.join([etree.tostring(x) for x in matching_groups])
<group name="YYY common string">

      <value val="42" description="the answer">
         <text>What do you get if you multiple 6 by 9?</text>
      </value>

   </group>

Upvotes: 1

Mawg
Mawg

Reputation: 40140

The problems were 1) wildcard searching of group name, and 2) the fact that the groups were nested at different levels in different files.

I implemented this brute force approach to build a dictionary of all such error entries in an error named group, anywhere in the file.

I leave it here for posterity and invite more elephant solutions.

    import xml.etree.ElementTree as ET

    parser = ET.XMLParser()
    parser.parser.UseForeignDTD(True)

    tree = ET.parse(inputFileName, parser=parser)
    root = tree.getroot()

    args.errorDefinitions = {}
    for element in tree.iter():
        if element.tag == 'group':
            if 'error' in element.get('name').lower():
                if element._children:
                    for errorMessage in element._children[0]._children:
                        args.errorDefinitions[errorMessage.get('name')] = \
                                  {'id':  errorMessage.get('id'), \
                                  'description': element._children[0].text}

Upvotes: 0

Related Questions