lxml: Get field after attribute value

Question

I'm parsing XML files and I have a follow-on question from here. From the below XML field:

and I would like to pull out the word anticipated, and the number. In the files that I have, 'enrollment type'/'enrollment' will remain stable between files, but 'anticipated' will not (e.g. sometimes it says 'actual' or something else) and the number will not remain stable.

The code that I tried:

from lxml import etree
import sys
import glob
list_to_get = ['enrollment']
list_of_files = glob.glob('*xml')
for each_file in list_of_files:
#    try:
        tree = etree.parse(each_file)
        root = tree.getroot()
        for node in root.xpath("//" + 'enrollment'):
            for e in node.xpath('descendant-or-self::*[not(*)]'):
                if e.attrib:
                        print e.attrib
                        print e.find('type')
                        print e.find('.//type')
                        print e.attrib['type']
                        print e.find(e.attrib['type']).text

using this method, I can pull out the type (e.g. anticipated/actual), but I can't find any way to pull out the number. If someone had an idea of the print line I should use, I would appreciate it.

I did look at some similar questions (e.g. here) but their suggestions don't seem to work for me.

Manjit Ullal · Accepted Answer

you are doing all the right things. Just dont complicate. put it in a simple way, get the root node using xpath and iterate each child node using getiterator and the value of each child can be got using tag.text

example

parent
    child
    child

for i in parent.getiterator():
    print(i.tag)#will give the first child tag
    print(i.text)#Will give the value

lxml: Get field after attribute value

Answers (1)

Related Questions