Reputation: 11

Python XML Iteration

I am trying to iterate through XML from a Requests response. Right now my python code look as such:

data = requests.post(url, data=xml, headers=headers).content
tree = ElementTree.fromstring(data)

And my XML looks as such:

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetPasswordResponse xmlns="https://tempuri.org/">
    <GetPasswordResult>
    <Content>ThisisContent</Content>
    <UserName>ExampleName</UserName>
    <Address>ExServer</Address>
    <Database>tempdb</Database>
    <PolicyID>ExPolicy</PolicyID>
    <Properties>
        <KeyAndValue>
            <key>Content</key>
            <value>ThisisContent</value>
        </KeyAndValue>
        <KeyAndValue>
            <key>ReconcileIsWinAccount</key>
            <value>Yes</value>
        </KeyAndValue>
    </Properties>
    </GetPasswordResult>
</GetPasswordResponse>
</soap:Body></soap:Envelope>'

How would I go about pulling out the values for the <Content>, <UserName>, and <PolicyID> tags using ElementTree? I have tried many different things but can't seem to get any of the values accessible.

Upvotes: 1

Answers (2)

yazz

Reputation: 331

There is a library that doesn't need to consider the XML namespace.

from simplified_scrapy import utils, SimplifiedDoc, req
xml = '''
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetPasswordResponse xmlns="https://tempuri.org/">
    <GetPasswordResult>
    <Content>ThisisContent</Content>
    <UserName>ExampleName</UserName>
    <Address>ExServer</Address>
    <Database>tempdb</Database>
    <PolicyID>ExPolicy</PolicyID>
    <Properties>
        <KeyAndValue>
            <key>Content</key>
            <value>ThisisContent</value>
        </KeyAndValue>
        <KeyAndValue>
            <key>ReconcileIsWinAccount</key>
            <value>Yes</value>
        </KeyAndValue>
    </Properties>
    </GetPasswordResult>
</GetPasswordResponse>
</soap:Body></soap:Envelope>
'''

# xml = req.post(url, data=xml, headers=headers)
doc = SimplifiedDoc(xml)
nodes = doc.select('GetPasswordResult').selects('Content|UserName|PolicyID')
print ([(node.tag, node.text) for node in nodes])

Result:

[('Content', 'ThisisContent'), ('UserName', 'ExampleName'), ('PolicyID', 'ExPolicy')]

Upvotes: 0

user5386938

Reputation:

That's a little tricky since you have elements with a namespace but no prefix.

from xml.etree import ElementTree as ET

data = '''\
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetPasswordResponse xmlns="https://tempuri.org/">
    <GetPasswordResult>
    <Content>ThisisContent</Content>
    <UserName>ExampleName</UserName>
    <Address>ExServer</Address>
    <Database>tempdb</Database>
    <PolicyID>ExPolicy</PolicyID>
    <Properties>
        <KeyAndValue>
            <key>Content</key>
            <value>ThisisContent</value>
        </KeyAndValue>
        <KeyAndValue>
            <key>ReconcileIsWinAccount</key>
            <value>Yes</value>
        </KeyAndValue>
    </Properties>
    </GetPasswordResult>
</GetPasswordResponse>
</soap:Body></soap:Envelope>
'''

tree = ET.fromstring(data)
nmsp = {
    'soap': 'http://schemas.xmlsoap.org/soap/envelope/',
    'x': 'https://tempuri.org/',
}  # NAMESPACE PREFIX ASSIGNMENT

print(tree.find('.//x:Content', namespaces=nmsp).text)
print(tree.find('.//x:UserName', namespaces=nmsp).text)
print(tree.find('.//x:PolicyID', namespaces=nmsp).text)

Upvotes: 1

Python XML Iteration

Answers (2)

Related Questions