Reputation: 809
I have an XML file with a structure like the following:
<?xml version="1.0">
<title>
<ch bk="Book1" num="1">
<ver num="1">ver1 content</ver>
<ver num="2">ver2 content</ver>
</ch>
<ch bk="Book1" num="2">
<ver num="1">ver1 content</ver>
<ver num="2">ver2 content</ver>
</ch>
<ch bk="Book2" num="1">
<ver num="1">ver1 content</ver>
<ver num="2">ver2 content</ver>
</ch>
</title>
Is there a way I can access individual ver
content for a specific ch num and book
in python? (for example, access ver num=2 of ch num=2 of bk=Book1)
I've looked at a few xml module classes that parse XML, however they go by tagName, and I don't see where I can input info such as num, bk, and ch.
Thanks a lot!
Upvotes: 1
Views: 1663
Reputation: 185861
You can access an element with a fine xpath expression :
//
stands for relative & recursive search
'//ch[@num="1"][@bk="Book1"]/ver[@num="1"]'
# ^ ^ ^ ^ ^
# ch node | | | |
# + attributes num = 1 | |
# + AND Book attribute = 1 | |
# ver node |
# + num attribut = 1
from lxml import etree
fp = open("/tmp/xml.xml")
tree = etree.parse(fp)
print(tree.xpath('//ch[@num="1"][@bk="Book1"]/ver[@num="1"]/text()')[0])
Upvotes: 1
Reputation: 901
This way, you can access ver
as elements.
import xml.etree.ElementTree as etree
tree = etree.ElementTree(file='input.xml')
#inputs
num = '1'
bk = 'Book2'
#list comprehension (assume num and bk is unique for ch)
vers = [ch.findall('ver') \
for ch in tree.findall('ch') \
if ch.attrib['num'] == num and ch.attrib['bk'] == bk][0]
#loop results
for ver in vers:
print 'num={0} text={1}'.format(ver.attrib['num'], ver.text)
Upvotes: 1
Reputation: 10221
Yes, you can use xpath to get target tag.
>>> from lxml import etree
>>> fp = open("test.html")
>>> tree = etree.parse(fp)
>>> r = tree.xpath('//ch[@num=2][@bk="Book1"]/ver/text()')
>>> r
['ver1 content', 'ver2 content']
Upvotes: 1
Reputation: 882781
A simple approach using the Python standard library's xml.etree.ElementTree
:
import xml.etree.ElementTree as ET
tree = ET.parse('yourfile.xml')
def locate(chnum, bk, vernum):
for ch in tree.findall('ch'):
if ch.get('num') != chnum: continue
if ch.get('bk') != bk: continue
for ver in ch.findall('ver'):
if ver.get('num') != vernum: continue
return ver.text
return None # no such chapter/book/version combo found
Upvotes: 1