Reputation: 1511
I'm parsing XML files and I have a follow-on question from here. From the below XML field:
<enrollment type="Anticipated">30</enrollment>
and I would like to pull out the word anticipated, and the number. In the files that I have, 'enrollment type'/'enrollment' will remain stable between files, but 'anticipated' will not (e.g. sometimes it says 'actual' or something else) and the number will not remain stable.
The code that I tried:
from lxml import etree
import sys
import glob
list_to_get = ['enrollment']
list_of_files = glob.glob('*xml')
for each_file in list_of_files:
# try:
tree = etree.parse(each_file)
root = tree.getroot()
for node in root.xpath("//" + 'enrollment'):
for e in node.xpath('descendant-or-self::*[not(*)]'):
if e.attrib:
print e.attrib
print e.find('type')
print e.find('.//type')
print e.attrib['type']
print e.find(e.attrib['type']).text
using this method, I can pull out the type (e.g. anticipated/actual), but I can't find any way to pull out the number. If someone had an idea of the print line I should use, I would appreciate it.
I did look at some similar questions (e.g. here) but their suggestions don't seem to work for me.
Upvotes: 2
Views: 1377
Reputation: 106
you are doing all the right things. Just dont complicate. put it in a simple way, get the root node using xpath and iterate each child node using getiterator and the value of each child can be got using tag.text
example
parent
child
child
for i in parent.getiterator():
print(i.tag)#will give the first child tag
print(i.text)#Will give the value
Upvotes: 2