Reputation: 237
Hopefully this is a quick answer for those experienced. I have a single XML file that contains a URL in it, I'd like to take the URL from the XML and then input it into a downloader script I've written. My only issue is I can't seem to properly parse JUST the url from the XML. This is what it looks like:
<program new-version="1.1.1.1" name="ProgramName">
<download-url value="http://website.com/file.exe"/>
</program>
Thanks in advance!
Upvotes: 1
Views: 7901
Reputation: 49567
import lxml
from lxml import etree
et = etree.parse("your xml file or url")
value = et.xpath('//download-url/@value')
print "".join(value)
output = 'http://website.com/file.exe'
you can also use cssselect
f = open("your xml file",'r')
values = f.readlines()
values = "".join(values)
import lxml.html
doc = lxml.html.fromstring(values)
elements = doc.cssselect('document program download-url') //csspath using firebug
elements[0].get('value')
output = 'http://website.com/file.exe'
Upvotes: 1
Reputation: 287835
>>> code = '''<program new-version="1.1.1.1" name="ProgramName">
... <download-url value="http://website.com/file.exe"/>
... </program>'''
With lxml:
>>> import lxml.etree
>>> lxml.etree.fromstring(code).xpath('//download-url/@value')[0]
'http://website.com/file.exe'
With the built-in xml.etree.ElementTree:
>>> import xml.etree.ElementTree
>>> doc = xml.etree.ElementTree.fromstring(code)
>>> doc.find('.//download-url').attrib['value']
'http://website.com/file.exe'
With the built-in xml.dom.minidom:
>>> import xml.dom.minidom
>>> doc = xml.dom.minidom.parseString(code)
>>> doc.getElementsByTagName('download-url')[0].getAttribute('value')
u'http://website.com/file.exe'
Which one you pick is entirely up to you. lxml needs to be installed, but is the fastest and most feature-rich library. xml.etree.ElementTree has a funky interface, and its XPath support is limited (depends on the version of the python standard library). xml.dom.minidom does not support xpath and tends to be slower, but implements the cross-plattform DOM.
Upvotes: 9