pythonNinja
pythonNinja

Reputation: 499

Python XML response parsing having nested tags

Have a response from backend api which is giving me the below response.I want to extract out the pid data "1664953412.79414"

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xml" href="/static/atom.xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:s="http://dev.splunk.com/ns/rest" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" shp_request_proxied_from="3DB91F64-892E-4DB2-9271-C5CB5CAFBFBB">
    <title>jobs</title>
    <updated>2022-10-05T10:48:30-07:00</updated>
    <author>
        <name>Splunk</name>
    </author>
    <opensearch:totalResults>1</opensearch:totalResults>
    <entry>
        <published>2022-10-05T00:03:34.000-07:00</published>
        <author>
            <name>abc-pull</name>
        </author>
        <content type="text/xml">
            <s:dict>
                <s:key name="pid">1664953412.79414</s:key>
            </s:dict>
        </content>
    </entry>
</feed>

I have tried various approaches but I am not able to extract out the data.

from xml.dom import minidom
pid = minidom.parseString(response.text).getElementsByTagName('pid')[0].childNodes[0].nodeValue

ThenI tried like this

import xml.etree.ElementTree as ET
root = ET.fromstring(response.text)
print(root.tag)
print(root.find('entry')) 

But not getting entry tag data also properly Can someone please help here. Note :- I cannot use xmltodict as thats not available in my enterprise packages

Upvotes: 0

Views: 89

Answers (1)

Md. Fazlul Hoque
Md. Fazlul Hoque

Reputation: 16187

Y ou can use BeautifulSoup to pull the text node value of tag s:key along with attr name="pid" because it's super powerful to parse html and xml DOM contents.

xml_doc = '''
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xml" href="/static/atom.xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:s="http://dev.splunk.com/ns/rest" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" shp_request_proxied_from="3DB91F64-892E-4DB2-9271-C5CB5CAFBFBB">
    <title>jobs</title>
    <updated>2022-10-05T10:48:30-07:00</updated>
    <author>
        <name>Splunk</name>
    </author>
    <opensearch:totalResults>1</opensearch:totalResults>
    <entry>
        <published>2022-10-05T00:03:34.000-07:00</published>
        <author>
            <name>abc-pull</name>
        </author>
        <content type="text/xml">
            <s:dict>
                <s:key name="pid">1664953412.79414</s:key>
            </s:dict>
        </content>
    </entry>
</feed>
'''

from bs4 import BeautifulSoup
pid = BeautifulSoup(xml_doc, 'lxml').select_one('s\:key[name="pid"]').text
print(pid)

Output:

1664953412.79414

Upvotes: 1

Related Questions