Reputation: 2516
So first the string
'<?xml version="1.0" encoding="UTF-8"?><metalink version="3.0" xmlns="http://www.metalinker.org/" xmlns:lcgdm="LCGDM:" generator="lcgdm-dav" pubdate="Fri, 11 Oct 2013 12:46:10 GMT"><files><file name="/lhcb/L"><size>173272912</size><resources><url type="https">https://test-kit.test.de:2880/pnfs/test.file</url><url type="https">https://test.grid.sara.nl:2882/pnfs/test.file</url></resources></file></files></metalink>'
What I want to extract is the url
text. Following code works but has flaws because it's hard coded:
root = ET.fromstring( xml_string )
for entry in root[0][0][1].iter():
print entry.text
So this only works if the xml structure is the same. I tried to use xpath but I never got it working or with tags. I never got any results.
Is it a problem with the format of the xml string or am I doing something wrong?
Upvotes: 3
Views: 290
Reputation: 3095
You used namespaces, so you need to use them in XPath:
for entry in root.findall('.//{http://www.metalinker.org/}url'):
print entry.text
Upvotes: 3
Reputation: 90889
You can use xpath (and findall
function of Node
) to get the urls , but since you have used xmlns="http://www.metalinker.org/"
for the root element, you will need to use that xmlns
in the xpath
as well.
Example -
>>> root = fromstring(xml_string)
>>> urls = root.findall('.//{http://www.metalinker.org/}url')
>>> for url in urls:
... print(url.text)
...
https://test-kit.test.de:2880/pnfs/test.file
https://test.grid.sara.nl:2882/pnfs/test.file
The above xpath will find all urls in the xml.
Upvotes: 3