Reputation: 141
I'm trying to process an XML file using XPATH in Python / lxml.
I can pull out the values at a particular level of the tree using this code:
file_name = input('Enter the file name, including .xml extension: ') # User inputs file name
print('Parsing ' + file_name)
from lxml import etree
parser = etree.XMLParser()
tree = etree.parse(file_name, parser)
r = tree.xpath('/dataimport/programmelist/programme')
print (len(r))
with open(file_name+'.log', 'w', encoding='utf-8') as f:
for r in tree.xpath('/dataimport/programmelist/programme'):
progid = (r.get("id"))
print (progid)
It returns a list of values as expected. I also want to return the value of a 'child' (where it exists), but I can't work out how (I can only get it to work as a separate list, but I need to maintain the link between them).
Note: I will be writing the values out to a log file, but since I haven't been successful in getting everything out that I want, I haven't added the 'write out' code yet.
This is the structure of the XML:
<dataimport dtdversion="1.1">
<programmelist>
<programme id="eid-273168">
<imageref idref="img-1844575"/>
How can I get Python to return the id + idref?
The previous examples I have worked with had namespaces, but this file doesn't.
Upvotes: 0
Views: 1707
Reputation: 2041
Since xpath()
method returns tree, you can use xpath again to get idref list you want:
for r in tree.xpath('/dataimport/programmelist/programme')
progid = r.get("id")
ref_list = r.xpath('imageref/@idref')
print progid, ref_lis
Upvotes: 1