Reputation: 6331
My question is regarding how to get information stored in a tag which allows for no closing tag. Here's the relevant xml:
<?xml version="1.0" encoding="UTF-8"?>
<uws:job>
<uws:results>
<uws:result id="2014-03-03T15:42:31:1337" xlink:href="http://www.cosmosim.org/query/index/stream/table/2014-03-03T15%3A42%3A31%3A1337/format/csv" xlink:type="simple"/>
</uws:results>
</uws:job>
I'm looking to extract the xlink:href url here. As you can see the uws:result tag requires no closing tag. Additionally, having the 'uws:' makes it a bit tricky to handle them when working in python. Here's what I've tried so far:
from lxml import etree
root = etree.fromstring(xmlresponse.content)
url = root.find('{*}results').text
Where xmlresponse.content is the xml data to be parsed. What this returns is
'\n '
which indicates that it's only finding the newline character, since what I'm really after is contained within a tag inside the results tag. Any ideas would be greatly appreciated.
Upvotes: 3
Views: 415
Reputation: 12401
You found the right node; you extracted the data incorrectly. Instead of
url = root.find('{*}results').text
you really want
url = root.find('{*}results').get('attribname', 'value_to_return_if_not_present')
or
url = root.find('{*}results').attrib['attribname']
(which will throw an exception if not present).
Because of the namespace on the attribute itself, you will probably need to use the {ns}attrib
syntax to look it up too.
You can dump out the attrib dictionary and just copy the attribute name out too.
text
is actually the space between elements, and is not normally used but is supported both for spacing (like etreeindent) and some special cases.
Upvotes: 2