Raketemensch
Raketemensch

Reputation: 31

Using lxml to parse xml with multiple namespaces

I'm pulling xml from a SOAP api that looks like this:

<SOAP-ENV:Envelope xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ae="urn:sbmappservices72" xmlns:c14n="http://www.w3.org/2001/10/xml-exc-c14n#" xmlns:diag="urn:SerenaDiagnostics" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" xmlns:xenc="http://www.w3.org/2001/04/xmlenc#" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<SOAP-ENV:Header/>
<SOAP-ENV:Body>
    <ae:GetItemsByQueryResponse>
      <ae:return>
        <ae:item>
          <ae:id xsi:type="ae:ItemIdentifier">
            <ae:displayName/>
            <ae:id>10</ae:id>
            <ae:uuid>a9b91034-8f4d-4043-b9b6-517ba4ed3a33</ae:uuid>
            <ae:tableId>1541</ae:tableId>
            <ae:tableIdItemId>1541:10</ae:tableIdItemId>
            <ae:issueId/>
          </ae:id>

I can't for the life of me use findall to pull something like tableId. Most of the tutorials on parsing using lxml don't include namespaces, but the one at lxml.de does, and I've been trying to follow it.

According to their tutorial you should create a dictionary of the namespaces, which I've done like so:

r = tree.xpath('/e:SOAP-ENV/s:ae', 
        namespaces={'e': 'http://schemas.xmlsoap.org/soap/envelope/',
                    's': 'urn:sbmappservices72'})

But that appears to not be working, as when I try to get the len of r, it comes back as 0:

print 'length: ' + str(len(r)) #<---- always equals 0

Since the URI for the second namespace is a "urn:", I tried using a real URL to the wsdl as well, but that gives me the same result.

Is there something obvious that I'm missing? I just need to be able to pull values like the one for tableIdItemId.

Any help would be greatly appreciated.

Upvotes: 3

Views: 719

Answers (1)

har07
har07

Reputation: 89305

Your XPath doesn't correctly corresponds to the XML structure. Try this way instead :

r = tree.xpath('/e:Envelope/e:Body/s:GetItemsByQueryResponse/s:return/s:item/s:id/s:tableId', 
        namespaces={'e': 'http://schemas.xmlsoap.org/soap/envelope/',
                    's': 'urn:sbmappservices72'})

For small XML, you may want to use // instead of / to simplify the expression, for example :

r = tree.xpath('/e:Envelope/e:Body//s:tableId', 
        namespaces={'e': 'http://schemas.xmlsoap.org/soap/envelope/',
                    's': 'urn:sbmappservices72'})

/e:Body//s:tableId will find tableId no matter how depth it is nested within Body. Note however that // surely slower than / especially when applied for a huge XML.

Upvotes: 2

Related Questions