Reputation: 6361
I need to load an XML file and convert the contents into an object-oriented Python structure. I want to take this:
<main>
<object1 attr="name">content</object>
</main>
And turn it into something like this:
main
main.object1 = "content"
main.object1.attr = "name"
The XML data will have a more complicated structure than that and I can't hard code the element names. The attribute names need to be collected when parsing and used as the object properties.
How can I convert XML data into a Python object?
Upvotes: 47
Views: 60890
Reputation: 2185
I would suggest xsData (https://xsdata.readthedocs.io/en/latest/)
# Parse XML
from pathlib import Path
from tests.fixtures.primer import PurchaseOrder
from xsdata.formats.dataclass.parsers import XmlParser
xml_string = Path("tests/fixtures/primer/sample.xml").read_text()
parser = XmlParser()
order = parser.from_string(xml_string, PurchaseOrder)
order.bill_to
Upvotes: 1
Reputation: 14121
David Mertz's gnosis.xml.objectify would seem to do this for you. Documentation's a bit hard to come by, but there are a few IBM articles on it, including this one (text only version).
from gnosis.xml import objectify
xml = "<root><nodes><node>node 1</node><node>node 2</node></nodes></root>"
root = objectify.make_instance(xml)
print root.nodes.node[0].PCDATA # node 1
print root.nodes.node[1].PCDATA # node 2
Creating xml from objects in this way is a different matter, though.
Upvotes: 4
Reputation: 58664
It's worth looking at lxml.objectify
.
xml = """<main>
<object1 attr="name">content</object1>
<object1 attr="foo">contenbar</object1>
<test>me</test>
</main>"""
from lxml import objectify
main = objectify.fromstring(xml)
main.object1[0] # content
main.object1[1] # contenbar
main.object1[0].get("attr") # name
main.test # me
Or the other way around to build xml structures:
item = objectify.Element("item")
item.title = "Best of python"
item.price = 17.98
item.price.set("currency", "EUR")
order = objectify.Element("order")
order.append(item)
order.item.quantity = 3
order.price = sum(item.price * item.quantity for item in order.item)
import lxml.etree
print(lxml.etree.tostring(order, pretty_print=True))
Output:
<order>
<item>
<title>Best of python</title>
<price currency="EUR">17.98</price>
<quantity>3</quantity>
</item>
<price>53.94</price>
</order>
Upvotes: 65
Reputation: 91545
I've been recommending this more than once today, but try Beautiful Soup (easy_install BeautifulSoup).
from BeautifulSoup import BeautifulSoup
xml = """
<main>
<object attr="name">content</object>
</main>
"""
soup = BeautifulSoup(xml)
# look in the main node for object's with attr=name, optionally look up attrs with regex
my_objects = soup.main.findAll("object", attrs={'attr':'name'})
for my_object in my_objects:
# this will print a list of the contents of the tag
print my_object.contents
# if only text is inside the tag you can use this
# print tag.string
Upvotes: 9
Reputation: 2698
#@Stephen:
#"can't hardcode the element names, so I need to collect them
#at parse and use them somehow as the object names."
#I don't think thats possible. Instead you can do this.
#this will help you getting any object with a required name.
import BeautifulSoup
class Coll(object):
"""A class which can hold your Foo clas objects
and retrieve them easily when you want
abstracting the storage and retrieval logic
"""
def __init__(self):
self.foos={}
def add(self, fooobj):
self.foos[fooobj.name]=fooobj
def get(self, name):
return self.foos[name]
class Foo(object):
"""The required class
"""
def __init__(self, name, attr1=None, attr2=None):
self.name=name
self.attr1=attr1
self.attr2=attr2
s="""<main>
<object name="somename">
<attr name="attr1">value1</attr>
<attr name="attr2">value2</attr>
</object>
<object name="someothername">
<attr name="attr1">value3</attr>
<attr name="attr2">value4</attr>
</object>
</main>
"""
#
soup=BeautifulSoup.BeautifulSoup(s)
bars=Coll()
for each in soup.findAll('object'):
bar=Foo(each['name'])
attrs=each.findAll('attr')
for attr in attrs:
setattr(bar, attr['name'], attr.renderContents())
bars.add(bar)
#retrieve objects by name
print bars.get('somename').__dict__
print '\n\n', bars.get('someothername').__dict__
output
{'attr2': 'value2', 'name': u'somename', 'attr1': 'value1'}
{'attr2': 'value4', 'name': u'someothername', 'attr1': 'value3'}
Upvotes: 1
Reputation: 5652
There are three common XML parsers for python: xml.dom.minidom, elementree, and BeautifulSoup.
IMO, BeautifulSoup is by far the best.
http://www.crummy.com/software/BeautifulSoup/
Upvotes: 0