Reputation: 29018
Musing over a recently asked question, I started to wonder if there is a really simple way to deal with XML documents in Python. A pythonic way, if you will.
Perhaps I can explain best if i give example: let's say the following - which i think is a good example of how XML is (mis)used in web services - is the response i get from http request to http://www.google.com/ig/api?weather=94043
<xml_api_reply version="1">
<weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0" >
<forecast_information>
<city data="Mountain View, CA"/>
<postal_code data="94043"/>
<latitude_e6 data=""/>
<longitude_e6 data=""/>
<forecast_date data="2010-06-23"/>
<current_date_time data="2010-06-24 00:02:54 +0000"/>
<unit_system data="US"/>
</forecast_information>
<current_conditions>
<condition data="Sunny"/>
<temp_f data="68"/>
<temp_c data="20"/>
<humidity data="Humidity: 61%"/>
<icon data="/ig/images/weather/sunny.gif"/>
<wind_condition data="Wind: NW at 19 mph"/>
</current_conditions>
...
<forecast_conditions>
<day_of_week data="Sat"/>
<low data="59"/>
<high data="75"/>
<icon data="/ig/images/weather/partly_cloudy.gif"/>
<condition data="Partly Cloudy"/>
</forecast_conditions>
</weather>
</xml_api_reply>
After loading/parsing such document, i would like to be able to access the information as simple as say
>>> xml['xml_api_reply']['weather']['forecast_information']['city'].data
'Mountain View, CA'
or
>>> xml.xml_api_reply.weather.current_conditions.temp_f['data']
'68'
From what I saw so far, seems that ElementTree
is the closest to what I dream of. But it's not there, there is still some fumbling to do when consuming XML. OTOH, what I am thinking is not that complicated - probably just thin veneer on top of a parser - and yet it can decrease annoyance of dealing with XML. Is there such a magic? (And if not - why?)
PS. Note I have tried BeautifulSoup
already and while I like its approach, it has real issues with empty <element/>
s - see below in comments for examples.
Upvotes: 27
Views: 26462
Reputation: 14121
lxml has been mentioned. You might also check out lxml.objectify for some really simple manipulation.
>>> from lxml import objectify
>>> tree = objectify.fromstring(your_xml)
>>> tree.weather.attrib["module_id"]
'0'
>>> tree.weather.forecast_information.city.attrib["data"]
'Mountain View, CA'
>>> tree.weather.forecast_information.postal_code.attrib["data"]
'94043'
Upvotes: 19
Reputation: 42648
I highly recommend lxml.etree and xpath to parse and analyse your data. Here is a complete example. I have truncated the xml to make it easier to read.
import lxml.etree
s = """<?xml version="1.0" encoding="utf-8"?>
<xml_api_reply version="1">
<weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0" >
<forecast_information>
<city data="Mountain View, CA"/> <forecast_date data="2010-06-23"/>
</forecast_information>
<forecast_conditions>
<day_of_week data="Sat"/>
<low data="59"/>
<high data="75"/>
<icon data="/ig/images/weather/partly_cloudy.gif"/>
<condition data="Partly Cloudy"/>
</forecast_conditions>
</weather>
</xml_api_reply>"""
tree = lxml.etree.fromstring(s)
for weather in tree.xpath('/xml_api_reply/weather'):
print weather.find('forecast_information/city/@data')[0]
print weather.find('forecast_information/forecast_date/@data')[0]
print weather.find('forecast_conditions/low/@data')[0]
print weather.find('forecast_conditions/high/@data')[0]
Upvotes: 4
Reputation: 29018
I found the following python-simplexml module, which in the attempts of the author to get something close to SimpleXML from PHP is indeed a small wrapper around ElementTree
. It's under 100 lines but seems to do what was requested:
>>> import SimpleXml
>>> x = SimpleXml.parse(urllib.urlopen('http://www.google.com/ig/api?weather=94043'))
>>> print x.weather.current_conditions.temp_f['data']
58
Upvotes: 1
Reputation: 730
The suds project provides a Web Services client library that works almost exactly as you describe -- provide it a wsdl and then use factory methods to create the defined types (and process the responses too!).
Upvotes: 0
Reputation: 7855
You want a thin veneer? That's easy to cook up. Try the following trivial wrapper around ElementTree as a start:
# geetree.py
import xml.etree.ElementTree as ET
class GeeElem(object):
"""Wrapper around an ElementTree element. a['foo'] gets the
attribute foo, a.foo gets the first subelement foo."""
def __init__(self, elem):
self.etElem = elem
def __getitem__(self, name):
res = self._getattr(name)
if res is None:
raise AttributeError, "No attribute named '%s'" % name
return res
def __getattr__(self, name):
res = self._getelem(name)
if res is None:
raise IndexError, "No element named '%s'" % name
return res
def _getelem(self, name):
res = self.etElem.find(name)
if res is None:
return None
return GeeElem(res)
def _getattr(self, name):
return self.etElem.get(name)
class GeeTree(object):
"Wrapper around an ElementTree."
def __init__(self, fname):
self.doc = ET.parse(fname)
def __getattr__(self, name):
if self.doc.getroot().tag != name:
raise IndexError, "No element named '%s'" % name
return GeeElem(self.doc.getroot())
def getroot(self):
return self.doc.getroot()
You invoke it so:
>>> import geetree
>>> t = geetree.GeeTree('foo.xml')
>>> t.xml_api_reply.weather.forecast_information.city['data']
'Mountain View, CA'
>>> t.xml_api_reply.weather.current_conditions.temp_f['data']
'68'
Upvotes: 10
Reputation: 213
I believe that the built in python xml module will do the trick. Look at "xml.parsers.expat"
Upvotes: 1
Reputation: 25281
Take a look at Amara 2, particularly the Bindery part of this tutorial.
It works in a way pretty similar to what you describe.
On the other hand. ElementTree's find*() methods can give you 90% of that and are packaged with Python.
Upvotes: 4
Reputation: 1777
If you haven't already, I'd suggest looking into the DOM API for Python. DOM is a pretty widely used XML interpretation system, so it should be pretty robust.
It's probably a little more complicated than what you describe, but that comes from its attempts to preserve all the information implicit in XML markup rather than from bad design.
Upvotes: -1
Reputation: 6745
If you don't mind using a 3rd party library, then BeautifulSoup will do almost exactly what you ask for:
>>> from BeautifulSoup import BeautifulStoneSoup
>>> soup = BeautifulStoneSoup('''<snip>''')
>>> soup.xml_api_reply.weather.current_conditions.temp_f['data']
u'68'
Upvotes: 3