whoisearth
whoisearth

Reputation: 4160

XML scanning for value

I have an XML with the following structure that I'm getting from an API -

<entry>
    <id>2397</id>
    <title>action_alert</title>
    <tes:actions>
        <tes:name>action_alert</tes:name>
        <tes:type>2</tes:type>
    </tes:actions>
</entry>

I am scanning for the ID by doing the following -

sourceobject = etree.parse(urllib2.urlopen(fullsourceurl))
source_id = sourceobject.xpath('//id/text()')[0]

I also want to get the tes:type

source_type = sourceobject.xpath('//tes:actions/tes:type/text()')[0]

Doesn't work. It gives the following error -

lxml.etree.XPathEvalError: Undefined namespace prefix

How do I get it to ignore the namespace?

Alternatively, I know the namespace which is this -

<tes:action xmlns:tes="http://www.blah.com/client/servlet">

Upvotes: 0

Views: 54

Answers (2)

har07
har07

Reputation: 89285

The proper way to access nodes in namespace is by passing prefix-namespace URL mapping as additional argument to xpath() method, for example :

ns = {'tes' : 'http://www.blah.com/client/servlet'}
source_type = sourceobject.xpath('//tes:actions/tes:type/text()', namespaces=ns)

Or, another way which is less recommended, by literally ignoring namespaces using xpath function local-name() :

source_type = sourceobject.xpath('//*[local-name()="actions"]/*[local-name()="type"]/text()')[0]

Upvotes: 1

Chris
Chris

Reputation: 140

I'm not exactly sure about the namespace thing, but I think it would be easier to use beautifulsoup: (text is the text)

from bs4 import BeautifulSoup

soup = BeautifulSoup(text)

ids = []
get_ids = soup.find_all("id")
for tag in get_ids:
    ids.append(tag.text)

#ids is now ['2397']

types = []
get_types = soup.find_all("tes:actions")
for child in get_types:
    type = child.find_all("tes:type")
    for tag in type:
        types.append(tag.text)

#types is now ['2']

Upvotes: 1

Related Questions