Reputation: 4160
I have an XML with the following structure that I'm getting from an API -
<entry>
<id>2397</id>
<title>action_alert</title>
<tes:actions>
<tes:name>action_alert</tes:name>
<tes:type>2</tes:type>
</tes:actions>
</entry>
I am scanning for the ID by doing the following -
sourceobject = etree.parse(urllib2.urlopen(fullsourceurl))
source_id = sourceobject.xpath('//id/text()')[0]
I also want to get the tes:type
source_type = sourceobject.xpath('//tes:actions/tes:type/text()')[0]
Doesn't work. It gives the following error -
lxml.etree.XPathEvalError: Undefined namespace prefix
How do I get it to ignore the namespace?
Alternatively, I know the namespace which is this -
<tes:action xmlns:tes="http://www.blah.com/client/servlet">
Upvotes: 0
Views: 54
Reputation: 89285
The proper way to access nodes in namespace is by passing prefix-namespace URL mapping as additional argument to xpath()
method, for example :
ns = {'tes' : 'http://www.blah.com/client/servlet'}
source_type = sourceobject.xpath('//tes:actions/tes:type/text()', namespaces=ns)
Or, another way which is less recommended, by literally ignoring namespaces using xpath function local-name()
:
source_type = sourceobject.xpath('//*[local-name()="actions"]/*[local-name()="type"]/text()')[0]
Upvotes: 1
Reputation: 140
I'm not exactly sure about the namespace thing, but I think it would be easier to use beautifulsoup:
(text
is the text)
from bs4 import BeautifulSoup
soup = BeautifulSoup(text)
ids = []
get_ids = soup.find_all("id")
for tag in get_ids:
ids.append(tag.text)
#ids is now ['2397']
types = []
get_types = soup.find_all("tes:actions")
for child in get_types:
type = child.find_all("tes:type")
for tag in type:
types.append(tag.text)
#types is now ['2']
Upvotes: 1