rodlozarg
rodlozarg

Reputation: 57

Parse XML with childs that have different tags in Python

I am trying to parse following xml data from a file with python for print only the elements with tag "zip-code" with his attribute name

<response status="success" code="19"><result total-count="1" count="1">
  <address>
    <entry name="studio">
      <zip-code>14407</zip-code>
      <description>Nothing</description>
    </entry>
    <entry name="mailbox">
      <zip-code>33896</zip-code>
      <description>Nothing</description>
    </entry>
    <entry name="garage">
      <zip-code>33746</zip-code>
      <description>Tony garage</description>
    </entry>
    <entry name="playstore">
      <url>playstation.com</url>
      <description>game download</description>
    </entry>
    <entry name="gym">
      <zip-code>33746</zip-code>
      <description>Getronics NOC subnet 2</description>
    </entry>
    <entry name="e-cigars">
      <url>vape.com/24</url>
      <description>vape juices</description>
    </entry>
   </address>
</result></response>

The python code that I am trying to run is

from xml.etree import ElementTree as ET

tree = ET.parse('file.xml')
root = tree.getroot()
items = root.iter('entry')
for item in items:
    zip = item.find('zip-code').text
    names = (item.attrib)
    print(' {} {} '.format(
        names, zip
    ))

However it fails once it gets to the items without "zip-code" tag.

How I could make this run? Thanks in advance

Upvotes: 1

Views: 96

Answers (3)

sammywemmy
sammywemmy

Reputation: 28649

As @AmitaiIrron suggested, xpath can help here.

This code searches the document for element named zip-code, and pings back to get the parent of that element. From there, you can get the name attribute, and pair with the text from zip-code element

for ent in root.findall(".//zip-code/.."):
    print(ent.attrib.get('name'), ent.find('zip-code').text)

studio 14407
mailbox 33896
garage 33746
gym 33746

OR

{ent.attrib.get('name') : ent.find('zip-code').text 
 for ent in root.findall(".//zip-code/..")}

{'studio': '14407', 'mailbox': '33896', 'garage': '33746', 'gym': '33746'}

Upvotes: 2

Aaron
Aaron

Reputation: 202

If you have no problem using regular expressions, the following works just fine:

import re

file = open('file.xml', 'r').read()

pattern = r'name="(.*?)".*?<zip-code>(.*?)<\/zip-code>'
matches = re.findall(pattern, file, re.S)

for m in matches:
    print("{} {}".format(m[0], m[1]))

and produces the result:

studio 14407
mailbox 33896
garage 33746
aystore 33746

Upvotes: 0

Amitai Irron
Amitai Irron

Reputation: 2055

Your loop should look like this:

# Find all <entry> tags in the hierarchy
for item in root.findall('.//entry'):
    # Try finding a <zip-code> child
    zipc = item.find('./zip-code')
    # If found a child, print data for it
    if zipc is not None:
        names = (item.attrib)
        print(' {} {} '.format(
            names, zipc.text
        ))

It's all a matter of learning to use xpath properly when searching through the XML tree.

Upvotes: 1

Related Questions