cloudcrypt
cloudcrypt

Reputation: 809

Python XML Parsing with Tag Specific Info

I have an XML file with a structure like the following:

<?xml version="1.0">
<title>
  <ch bk="Book1" num="1">
    <ver num="1">ver1 content</ver>
    <ver num="2">ver2 content</ver>
  </ch>
  <ch bk="Book1" num="2">
    <ver num="1">ver1 content</ver>
    <ver num="2">ver2 content</ver>
  </ch>
  <ch bk="Book2" num="1">
    <ver num="1">ver1 content</ver>
    <ver num="2">ver2 content</ver>
  </ch>
</title>

Is there a way I can access individual ver content for a specific ch num and book in python? (for example, access ver num=2 of ch num=2 of bk=Book1) I've looked at a few xml module classes that parse XML, however they go by tagName, and I don't see where I can input info such as num, bk, and ch. Thanks a lot!

Upvotes: 1

Views: 1663

Answers (4)

Gilles Qu&#233;not
Gilles Qu&#233;not

Reputation: 185861

You can access an element with a fine expression :

XPath explained

// stands for relative & recursive search

'//ch[@num="1"][@bk="Book1"]/ver[@num="1"]'
#  ^      ^        ^           ^      ^
# ch node |        |           |      |
#  + attributes num = 1        |      |
#  + AND Book attribute = 1    |      |
#                           ver node  |
#                           + num attribut = 1

python code :

from lxml import etree
fp = open("/tmp/xml.xml")
tree = etree.parse(fp)
print(tree.xpath('//ch[@num="1"][@bk="Book1"]/ver[@num="1"]/text()')[0])

Upvotes: 1

Azmi Kamis
Azmi Kamis

Reputation: 901

This way, you can access ver as elements.

import xml.etree.ElementTree as etree

tree = etree.ElementTree(file='input.xml')

#inputs
num = '1'
bk = 'Book2'

#list comprehension (assume num and bk is unique for ch)
vers =  [ch.findall('ver') \
         for ch in tree.findall('ch') \
         if ch.attrib['num'] == num and ch.attrib['bk'] == bk][0]

#loop results
for ver in vers:
    print 'num={0} text={1}'.format(ver.attrib['num'], ver.text)

Upvotes: 1

Vivek Sable
Vivek Sable

Reputation: 10221

Yes, you can use xpath to get target tag.

>>> from lxml import etree
>>> fp = open("test.html")
>>> tree = etree.parse(fp)
>>> r = tree.xpath('//ch[@num=2][@bk="Book1"]/ver/text()')
>>> r
['ver1 content', 'ver2 content']

Upvotes: 1

Alex Martelli
Alex Martelli

Reputation: 882781

A simple approach using the Python standard library's xml.etree.ElementTree:

import xml.etree.ElementTree as ET
tree = ET.parse('yourfile.xml')

def locate(chnum, bk, vernum):
    for ch in tree.findall('ch'):
        if ch.get('num') != chnum: continue
        if ch.get('bk') != bk: continue
        for ver in ch.findall('ver'):
            if ver.get('num') != vernum: continue
            return ver.text
    return None  # no such chapter/book/version combo found

Upvotes: 1

Related Questions