Rietty
Rietty

Reputation: 1156

Using BeautifulSoup to parse XML with tags that contain colon

I was struggling a bit to parse XML that I got using BeautifulSoup and although I've read the documents, I can't seem to get it to work properly with the way my XML is set up.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xml" href="/static/atom.xsl"?>
<feed xmlns:s="server url here">
    <!-- Feed elements>
    <entry>
        <!-- Other Elements -->
        <content type="text/xml">
            <s:dict>
                <!-- Other keys. -->
                <s:key name="sid">DATA I WANT HERE</s:key>
                <!-- Other keys. -->
            </s:dict>
            <!-- Lots of other dicts here. -->
        </content>
    </entry>
    <! -- Other entries -->
</feed>

My goal is to obtain the data from all the s:key with attribute name that has a value of sid. (i.e. All s:key have a name, but only one per <entry> is of type sid.

How do I print out all the text between the relevant s:key that is of type sid in my data?

What I've tried is:

print(tree.findAll('key', {'name'}))

as well as:

for elem in tree.feed.entry.content.dict.key:
    print(elem)

but obviously these are flawed and do not work properly as I want them to.

How do I accomplish what I would like to obtain?

Upvotes: 1

Views: 1275

Answers (1)

KunduK
KunduK

Reputation: 33384

Try the below code:

soup = bs4.BeautifulSoup(html_doc, 'lxml')
elements = soup.findAll("s:key", {"name" : "sid"})
for lele in elements:
    print(lele.text)

Output :-

DATA I WANT HERE

Upvotes: 3

Related Questions